OCTOBER 


- 9% Se 
VOLUME 19 
NUMBER 5 


JOURNAL OF 
CONSULTING 
PSYCHOLOGY 


CAL ASSOCIATION 








October, 1955 Vol. 19, No. 5 


Contents 


Predicting Intelligence from the Rorschach: Stewart G. Armitage, Paul D. Greenberg, David 
Pearl, David G. Berger, and PaulG. Daston - - - - - - - - - = = = = 


Validity of the Grassi-Fairfield Block Substitution Test in Differential Diagnosis: Pearl 
Hewis = <<< <«-«2e4#s © = = o's = ‘ 


The Rorschach as a Means of Predicting Treatment Outcome: Gordon Filmer-Bennett - 


Differential Responses of Normals, Psychoneurotics, and Psychotics on Rorschach Deter- 
minant Shift: Bernard A. Stotsky - - - - - - - - = - = = = 


Manifest Anxiety and Rorschach Performance in a Chronic Patient Population: Leonard 
D. Goodstein and Leo Goldberger - - - - - - - -~ = = = = = = = 


The Reliability and Validity of the Rotter Incomplete Sentences Test: Ruth Churchill and 
Vaughn J.Crandall- - - - - --+-+-++-+++++e+e+e- 


Predictive Behavior and Personal Adjustment: James Bieri, Edward Blacharsky, and J. 
Wihem Bead = - - 2 © =e meee se es we we we ow 


Success in Psychotherapy as a Function of Certain Actuarial Variables: Desmond S. Cart- 
wright - - --+-+-+-+-++2++2+2++2++e+e 


Perceived Parental Attitudes, the Self, and Security: Sidney M. Jourard and Richard M. 
Remy - ---+-+2+2e2e 2s 2-5 2+ - se ee ee 


Children’s Responses to Human and Animal Stories and Pictures: Nancy A. Boyd and 
George Mandler - - - - - - => = -+-+-+-+2-+2+-+- 2 - so 


The Iowa Picture Interpretation Test: A Multiple-Choice Variation of the TAT: John 
R. Hurley - - - - +--+ + 5 +5 +s e+ ee + 


The Discriminative Ability of the Blacky Pictures with Ulcer Patients: Lewis Bernstein 
and Philip H.Chase - - --+-+-+--+-2++-+2+2++2+2+2+2+2+---- 


Evidence for the Validity of the Children’s Form of the Picture-Frustration Study: Eugene 
E. Levitt and William H. Lyle, Jr. - - - - - - - = + = + + = - = : 


Perceptual Tests and Acute and Chronic Status as Predictors of Improvement in Psychotic 
Patients: Sylvia L.Sonder - - - - - = = = = = = = = = = = = = = 

The Relation of the Trail Making Test to Organic Brain Damage: Ralph M. Reitan- - 

Goal-Setting Rigidity in an Ambiguous Situation: Seymour L. Zelen - - - - - - - 


A Comparison of Raven’s Progressive Matrices (1938) with the ACE Psychological Ex- 
amination and the Otis Gamma Mental Ability Test: Byron J. Bolin - - - - - 


The Taylor Manifest Anxiety Scale and Intelligence: Mark S. — ner, Jr., Eugene Sersen, 
and M.E.Tresselt - - - - - - = = = = 


New Books and Tests - - - - - = = - 








Brief Reports 


The Journal of Consulting Psychology will 
accept Brief Reports of research studies in 
clinical psychology for early publication with- 
out expense to the author. The procedure is 
intended to permit the publication of soundly 
designed studies of specialized interest or lim- 
ited importance which cannot now be ac- 
cepted because of lack of space. Several pages 
in each issue will be devoted to Brief Reports, 
published in the order of their receipt with- 
out respect to the dates of receipt of the regu- 
lar articles. Most Brief Reports appear in the 
first or second issue to go to press following 
their final acceptance. 

An author who wishes to submit a Brief 
Report: 


1. Sends the Brief Report, limited to one printed 
page and prepared according to the specifications 
given below. 

2. Also sends to the Editor a full report of the re- 
search study, in sufficient detail to give a clear ac- 
count of its background, procedure, results, and con- 
clusions, which will be filed with the American 
Documentation Institute to insure indefinite avail- 
ability. 

3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 
charge to all who request it as long as the supply 
lasts. 

4. Agrees not to submit the full report to another 
journal of general circulation. 


Specifications 


Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 
of the study and as full an account of the re- 
sults as space permits. 


To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
including all matter except the title and the 
author’s lines, must not exceed 70 lines av- 
eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style (1). 
Headings, tables, and references are avoided 
or, if essential, must be counted in the 70 
lines. Each Brief Report must be accom- 
panied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 70-line quota: * 


1An extended report of this study may be ob- 
tained without charge from John Doe, 300 Market 
St., Prospect 6, Mass. (giving the author’s full name 
and address), or for a fee from the American Docu- 
mentation Institute. Order Document No. —— from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $ for microfilm or 
$—— for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 





Extended report. The full report is pre- 
pared in the style specified by the Pubdlica- 
tion Manual (1), except that it may be typed 
with single spacing for economy in photo- 
duplication by the ADI. 


Reference 


1. American Psychological Association. Council of 
Editors. Publication manual of the American 
Psychological Association. Psychol. Bull., 1952, 
49, 389-449. 








Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Predicting Intelligence from the Rorschach’ 


Stewart G. Armitage, Paul D. Greenberg, David Pearl, 
David G. Berger,’ and Paul G. Daston * 
VA Hospital, Battle Creek, Michigan 


Rorschach (7) cited seven factors in his 
test which he believed to be important in 
predicting a person’s intelligence. These in- 
cluded: good form perception, many M and 
W responses, a W emphasis in the approach, 
orderly sequence, low A%, and optimal O. 
Beck (3) lists several Rorschach elements 
which he believes are related to intelligence, 
such as: Z, M, W, A%, and others. Klopfer 
and Kelley (6) devote space to the estimation 
of the intellectual level, stressing the number 
and quality of W, the number and quality of 
M, form accuracy level, original responses, 
variety of content, and succession of re- 
sponses. They say, “It is possible to evaluate 
a Rorschach record and to ‘guess,’ in the ma- 
jority of cases within a range of ten points, 
what the intelligence of the subject in terms 
of a Binet IQ might be.. .” (6, p. 274). 
They also state that “Roughly speaking, the 
Rorschach results have been found to cor- 
relate as highly with intelligence test results 
as the results of different intelligence tests 
correlate with one another” (6, p. 266). They 
further affirm that the Rorschach method is 
enhanced by its ability to differentiate be- 
tween potential capacity and actual efficiency. 

The experimental investigations reported in 
the literature provide no definitive evaluation 
of these statements. They have been pri- 
marily concerned with relating single or com- 
posite Rorschach scoring variables to various 
measures of intelligence. In some studies, for 
example, significant relationships have been 
obtained between intelligence test scores and 


1 From the VA Hospital, Battle Creek, Michigan. 

2Now at VA Hospital, Leech Farm Road, Pitts- 
burgh, Pennsylvania. 

® Now at VA Hospital, Brockton, Massachusetts. 


M (1, 2, 8), W (2), (W + Dd)/W (2) and 
Z (9). These relationships, however, have not 
been consistent from one study to another. 
Moreover, the studies vary considerably in 
scope, criteria, and sampling. A clinician as- 
sociated with a mental hospital finds it rather 
difficult to apply these findings to his needs. 


Statement of the Problem 


The present study was undertaken t deter- 
mine the accuracy with which the leve: of in- 
telligence (operationally defined as Wechsler- 
sellevue IQ) of the individual neuropsychi- 
atric hospital patient can be predicted from 


the Rorschach. 
Procedure 
Two approaches were made to this prob- 


lem. The first consisted of an attempt to de- 
termine whether there is a relationship be- 


tween Wechsler-Bellevue Form I total IQ 
and a number of Rorschach factors. This was 
an objective, statistical procedure utilizing 


chi-square and correlational techniques, and 
will be outlined and discussed more carefully 
when the results are considered. The second 
approach involved subjective estimates of the 
present intellectual status of each individual 
patient, based upon (a) the Rorschach psy- 
chogram, and (&) the Rorschach protocol. 
In connection with this approach, three staff 
psychologists were chosen who were consid- 
ered to be equal in ability, experience, and 
training. These clinicians were asked to judge 
120 Rorschach records obtained from 40 pa- 
tients diagnosed as psychoneurotic, 40 as un- 
classified schizophrenic, and 40 as paranoid 
schizophrenic. The three diagnostic groups 
were matched on an individual basis as to 


321 








322 


age and total Wechsler-Bellevue IQ. This 
matching was fairly stringent, the maximum 
amount of deviation allowed being three 
. points in total IQ and one year in age. The 
groups did not differ significantly in these re- 
spects. All names and other identifying clues 
were removed from the records. The psycho- 
grams were separated from the protocols so 
that estimates of intelligence could be made 
from the psychograms and protocols inde- 
pendently. Each of the judges was asked to 
predict the present total Wechsler-Bellevue 
IQ for each protocol and each psychogram. 
They were asked to list the reasons and 
methods they used in arriving at each of their 
predictions. Finally, as one measure of reli- 
ability, the records were divided randomly 
into three groups of comparable IQ and diag- 
nostic representation, and each judge was 
given 40 of the protocols and 40 of the psy- 
chograms again, and was asked to re-evaluate 
them following the procedure outlined above. 

The possibility was noted that judgments 
of intelligence may tend to cluster around the 
mean. Thus, if the experimental sample con- 
tained a high proportion of patients whose 
IQ fell within the normal range, and guesses 
are more frequently made in this range, 
spurious accuracy would result. To avoid 
this possibility, each diagnostic subgroup con- 
tained ten records obtained from patients in 
each of the IQ ranges: 120 and above, 110- 
119, 90-109, and 89 and below. This means 
that each IQ range was equally represented 
and the resultant sample was rectangularly, 
not normally, distributed as far as IQ was 
concerned. 

It should be noted that certain precautions 
were taken in connection with the data. In 
order to assure uniformity of scoring all rec- 
ords were rescored, using Beck’s system and 
his new norms (3). This rescoring was done 
jointly by two staff members not participat- 
ing in the judging procedure. The tests had 
been administered by a large number of ex- 
aminers. In light of the possibility of ex- 
aminer influence on Rorschach findings (5), 
the sample was drawn so that each examiner 
contributed approximately the same number 
of tests to each of the aspects of the sample. 
None of the tests had been given by those 
involved in the judging procedure. Both the 





S. G. Armitage, P. D. Greenberg, D. Pearl, D. G. Berger, and P. G. Daston 


Wechsler-Bellevue and Rorschach were ad- 
ministered within 48 hours of the patient’s 
admission to the hospital, and during the 
same testing session. As far as could be de- 
termined, none of the patients had taken 
either the Rorschach or Wechsler-Bellevue 
before, although it is possible that some of 
them may have been exposed to these tests 
during their service hospitalization. 


Subjects 


The samples used in the two analyses dif- 
fered. For the objective, statistical approach, 
the records from 503 patients were utilized. 
All were between the ages of 20 and 45 
(mean 29.3) and were World War II veter- 
ans. The mean number of years of education 
for the sample was 10.2. They had either no 
hospitalization prior to their admission or 
were hospitalized only once, that occurring in 
the service. Of the total group, 5 per cent had 
had some type of shock therapy, either IST 
or EST, but none were included whose ther- 
apy occurred within one year prior to testing. 
Three diagnostic categories were represented: 
252 unclassified schizophrenics, 103 paranoid 
schizophrenics, and 148 neurotics of varying 
types. These diagnoses were based on staff 
conferences where consideration was given to 
the patient’s past history, current behavior, 
and the results of the psychological tests. For 
the total sample, the mean W-B IQ was 104.2 
(SD 14.8).* 


4 There were no statistically significant differences 
among the three diagnostic groups with respect to 
either IQ, age, or education. 


Table 1 


Intelligence, Age, and Education of Sample 
Used in Judgmental Analysis 








Diagnostic group 











Unclassified Paranoid 
schizophrenic schizophrenic Neurotic 
(N =40) (N =40) (N =40) 
Variable Mean SD Mean SD Mean SD 
W-B IQ 
(fullscale) 106.1 16.9 105.9 16.5 106.4 17.0 
Age 28.6 4.4 28.9 5.2 29.2 5.0 
Education 10.1 3.3 10.7 2.7 10.6 2.6 








For the judgmental approach, 120 records 
were drawn systematically from the large 
sample of 503 employed in the objective 
analysis. The selection of this population of 
records was dictated by the matching pro- 
cedure described above for the three diag- 
nostic groups used in this aspect of the study. 
Those patients selected from the large sample 
for the judgmental approach had no history 
of shock therapy. The vital statistics for this 
smaller sample are presented in Table 1. 


Results 
Objective Analysis 


The objective, statistical analysis relating 
Wechsler-Bellevue IQ to various Rorschach 
scores, will be considered first. Using a sample 
of 503 cases, r,,’s were computed between 
W-B IQ and each of 19 Rorschach variables. 
As can be seen from Table 2, the correlation 
coefficients were uniformly small. A statisti- 
cally significant relationship to IQ was ob- 


Table 2 


Product-Moment Correlations Between W-B IQ and 
Rorschach Factors for Present and 
Previous Investigations 
Investigation 
Present Wishner Tucker 
(N=503) (V=40) (N=100) 


Rorschach factor 





P 34 

R .26 212 
M .26* .206 .262 
Y+T 25 

D .24 
M+m+FM 24 

No. Content Categories .23 .100 
No. Blends .22* 

No. C Responses 21 

Weighted C 18 

V .18* 

m .12* 

Dd 11* 

S 11 

FM 09% 

F% 08 

W 04 .008 
A%N —.01 128 
F+% —.14 077 





Note.—The r's in boldface type are significant at the .01 
level; in italics at the .05 level. 

* Denotes instances where etas were run because relationship 
appeared somewhat curvilinear. None of the etas differed 
appreciably from the ray's. 


Predicting Intelligence from the Rorschach 








323 


tained, however, with 16 of the variables, 13 
of these being at the .01 level of confidence, 
and three at the .05 level.® Certain of the 
correlation coefficients can be compared to 
those obtained in previous investigations 
(e.g., 8, 9). It will be noted (Table 2) that 
the present r,,’s are comparable in magnitude 
to those found by others, although in general 
those previously obtained were not signifi- 
cant. The fact that our findings are statisti- 
cally significant seems to be a function of the 
size of the sample rather than of differences 
in the strength of the relationship. In those 
instances where the relationship seemed to 
depart from linearity, etas were computed, 
but these did not differ appreciably from the 
Pearson coefficients. 

In order to determine whether IQ could be 
predicted more effectively from a composite 
of Rorschach determinants than from single 
determinants, a multiple regression equation 
was computed. For this purpose, those six 
variables, with the exception of M, were se- 
lected which had shown the highest relation- 
ship to IQ. M was not utilized since there is 
some evidence (1, 2) to suggest that its re- 
lationship to IQ may tend toward curvilinear- 
ity. As a substitute, a measure of “total M,” 
i.e., M+m-+ FM, was used. From the in- 
tercorrelations among IQ and the six Ror- 
schach variables employed, a multiple R of 
.38 was obtained and utilized in computing a 
multiple regression equation for the predic- 
tion of 1Q. Predictions based on this equation 
for an independent sample of 207 cases 
proved, however, to be ineffective, in that 
the percentage of cases which could be cor- 
rectly placed within + 10 IQ points of the 
criterion did not exceed chance expectancy. 

Correlational techniques, while inappropri- 
ate with certain Rorschach variables where 
assumptions such as normality may be diffi- 
cult to meet, were nevertheless employed to 
permit comparisons with results of previously 
reported investigations. A more appropriate 
analysis utilizing chi square in 3 x 4 tables 
was also carried out. In all cases where sig- 
nificant correlations had been found, the re- 


5 The .05 level was selected as indicative of sig- 
nificance in advance of the analyses described here 
The large majority of our significant results, how- 
ever, lay at or beyond the .01 level of confidence. 





324 


lationship between determinant magnitude 
and IQ level as shown by chi square was also 
statistically significant. Similar chi-square 
analyses were also done for each of the three 
diagnostic groups separately, and the same 
general pattern held within each diagnostic 
group as was true of the total sample. 

For each of the determinants which showed 
a significant relationship to IQ level, an at- 
tempt was made to determine where the sig- 
nificance lay (i.e., were there differences be- 
tween certain IQ levels in the production of 
the various determinants which were respon- 
sible for the over-all significance?). This was 
done by separate chi-square analyses com- 
paring each of the four IQ groups, 89 and 
below, 90-109, 110-119, and 120 and above, 
with each other with respect to the deter- 
minant magnitudes. 

The findings indicated that the bright nor- 
mal (110-119) and superior (120 and above) 
groups did not differ from each other in their 
production of the various Rorschach vari- 
ables. They did, however, differ significantly 
from the average (90-109) group in the pro- 
duction of all of the variables involved in this 
analysis except M and Y, and they differed 
from the below average (89 and below) group 
in every case. Further, the average group dif- 
fered from the below average group with re- 
gard to all variables except the number of 
blends and number of content categories. In 
short, then, in the case of most of the deter- 
minant magnitudes, the average group ex- 
ceeded the below average group, and was in 
turn exceeded by the above average groups. 

Finally, these results seemed to suggest that 
cutting scores might be established which 





S. G. Armitage, P. D. Greenberg, D. Pearl, D. G. Berger, and P. G. Daston 


could be used roughly to predict an individu- 
al’s IQ. Such cutting scores were obtained 
by plotting IQ’s against each of the nine 
Rorschach variables showing the highest cor- 
relations with IQ and by inspection selecting 
those determinant magnitudes which appeared 
best to differentiate the various IQ groups. 
On this basis, a composite index or pattern of 
cutting scores was derived. This index was 
then applied to an independent validating 
sample of 207 NP cases, comparable in diag- 
nostic representation to the original sample. 
The resulting estimates of IQ level placed 
each of the cases of the validating sample in 
one of three broad IQ groupings: average 
range (90-109), above average (110 and 
above), and below average (89 and below), 
and permitted an evaluation of the utility of 
the index. Chi-square analysis revealed that 
prediction of these groupings using the com- 
posite pattern of cutting scores was not sig- 
nificantly better than chance. Further analy- 
ses indicated that this lack of significance was 
true not only of predictions made for the 
total sample, but also for each IQ grouping 
considered separately.® 


Judgmental Analysis 


As outlined above (see Procedure) this as- 
pect of the study involved subjective esti- 
mates, by three judges, of the present func- 
tioning IQ of each individual patient in a 


6 Similar analyses, to be reported in a subsequent 
paper, employing the same statistical techniques as 
were described above in connection with the total 
sample, were also applied to each diagnostic group 
separately. The findings revealed no significant dif- 
ferences between the diagnostic groups with respect 
to the effectiveness with which IQ could be predicted. 


Table 3 


Correlations Between Judges’ Estimates and W-B IQ for Three Diagnostic Groups 








Diagnostic group 
































Unclassified schizophrenic Paranoid schizophrenic Neurotic 
Protocol Psychogram Protocol Psychogram Protocol Psychogram 
Judge fz, rho fan tO fx, rho fry Tho Try tho lo who 
I 69 .72 A3 Al . 56 <9 69 69 35 34 
II ee 42 A7 67 .66 a wae 62 .63 AD AB 
Il 66 .64 56 = =.S3 69 67 a .ae 34 = .36 27 ~=.25 








sample of 120, including 40 neurotics, 40 un- 
classified schizophrenics, and 40 paranoid 
schizophrenics. Separate estimates were ob- 
tained based upon the Rorschach psycho- 
gram and the Rorschach protocol. 

Table 3 indicates the coefficients of cor- 
relation between the estimates of present 
functioning IQ and the actual total Wechsler- 
Bellevue IQ’s. The findings are broken down 
for the three judges separately, for the three 
diagnostic groups separately, and for the 
estimates based on psychograms and proto- 
cols separately. As a consequence of the rec- 
tangular stratification of the sample, rank- 
order correlations were utilized. However, 
fry's were also computed and, surprisingly, 
were in close approximation to the rhos in 
every case. Both coefficients are presented in 
Table 3. 

It will be noted that in general the coeffi- 
cients of correlation for the three judges 
agree closely, as do the correlations for the 
three diagnostic groups. The only discrepancy 
which is at a statistically significant level 
(p = .05) occurs in judgments of the proto- 
cols for the neurotic group, where the coeffi- 
cient for Judge III is significantly lower than 
that for Judge I. (It might be possible to 
speculate about reasons for this finding, but 
it will be remembered that in a matrix of 18 
comparisons it is not surprising to find one 
significant difference at the .05 level on a 
chance basis.) In general, the correlations 
based on the protocols are on the order of 
+ .70 (median 7, .69) and those for the psy- 
chograms + .40 (median r, .43). 

Two measures of the reliability of the 


Predicting Intelligence from the Rorschach 





Table 4 


Intrajudge Reliability Coefficients Based 


on Estimates 
of IQ from Protocols and Psychograms 


Protocol Psychogram 

Judge Tox rho Ren rho 

I 58 60 84 84 

1 55 56 878i 

Il 45 4 72 74 
judges, estimates of intelligence were com- 
puted: imtrajudge reliability (based on hav 


ing the judges repeat their estimates of in- 
telligence on a sample of 40 cases at a later 
point in time) and reliability 
Table 4 indicates the intrajudge reliability 
coefficients for each of the three judges on 


interjudge 
ity 


the protocols and psychograms separately 
For the protocols, the reliability coefficient 
are on the order of 
and for the psychograms on the order of 
+ .80 (median r, .84). As far as the inter- 
judge reliability is concerned (see Table 5), 
the coefficients were all on the order of + .70 
(median r, .69), and this was true for both 
the protocols (median r, .71) and the psycho- 
grams (median r, .68). The size of the sam- 
ple to be rejudged was not large enough to 
permit fractionating the data in order to con- 
sider the three diagnostic groups separately 

In order to provide a more specific answer 
to the question of how accurately clinicians 
can estimate functioning intelligence from the 
Rorschach, an analysis was made of the per- 
centage of cases in which the judges’ esti- 


5) 


- 50 (median r, 


Table 5 


Interjudge Reliability Coefficients for Three Diagnostic Groups Based on Estimates of IQ 








from Protocols and Psychograms 


Diagnostic group 





Unclassified schizophrenic 


Protocol Psychogram 
Judges Tz, rho fz, Tho 
I with I 82 .83 68 .70 
I with ITT 71 .66 68 .68 


IT with IT 75 69 64 .68 











Paranoid schizophrenic Neurotic 
Protocol Psychogram Protocol Ps; gra 
Try tho fry Tho Try Tho in 
77 =.76 10. .69 70.70 71 =.65 
72 74 .68 66 57 59 69 67 
61 .61 7 












S. G. Armitage, P. D. Greenberg, D. Pearl, D. G. Berger, and P. G. Daston 


Table 6 


Percentage of Cases in Which Judges’ Estimates Were Within 10 Points of W-B IQ 




















Diagnostic group 












Paranoid schizophrenic 

























































mates of functioning IQ were within 10 points 
of the actual W-B IQ (Table 6). These per- 
centages were then compared, in a chi-square 
analysis, to the accuracy of prediction that 
would be expected on a chance basis. Chance 
accuracy was empirically determined by hav- 
ing four clinicians unfamiliar with the study 
guess the IQ levels of each of a sample of 120 
cases. They were provided with nothing other 
than 120 code numbers unrelated to IQ and 
were asked to guess the IQ of each of the in- 
dividuals represented by these code numbers. 
It was found, on the basis of this empirical 
determination of chance accuracy, that 27 per 
cent of the cases were correctly placed within 
10 points of the criteria. 

Chi-square analyses yielded the following 
findings: 

1. Rorschach protocols. 

a. The estimates of each of the judges 
based on the protocols were “correct” (within 
+ 10 points of the criterion) in a greater per- 
centage of cases than would be expected by 
chance (significant beyond the .001 level in 
each case). 

b. When the judges’ estimates were pooled, 
analysis revealed no significant differences 
among diagnostic groups in the effectiveness 
with which the IQ could be predicted. While 
each judge tended to vary somewhat in the 
accuracy of his judgments for the different 
diagnostic groups, analysis again demon- 
strated that these differences were not sig- 
nificant. 

c. In an over-all comparison of the ac- 
curacy of the three judges there were no sig- 
nificant differences. Furthermore, the judges 
did not differ among themselves in their ac- 
curacy in judging the protocols of either of 








Unclassified schizophrenic Neurotic 
Judge Protocol Psychogram Protocol Psychogram Protocol Psychogram 
I 62.5 45.0 70.0 50.0 72.5 $2.5 
II 55.0 47.5 57.5 50.0 55.0 42.5 
Ill 50.0 37.5 55.0 35.0 45.0 35.0 





the schizophrenic groups. There was, how- 
ever, a significant difference between the 
judges on the protocols of the neurotic group, 
and this was essentially attributable to the 
fact that Judge III did significantly less well 
than Judge I. 

2. Rorschach psychograms. 

a. The estimates of Judges I and II based 
on the psychograms were “correct” in a 
greater percentage of cases than would be ex- 
pected by chance (p = .001 and p = .01, re- 
spectively). The accuracy of Judge III did 
not quite meet the usual statistical criteria 
for significance. 

b. The differences among the three judges 
in their accuracy in judging the psychograms 
were not, however, at a statistically signifi- 
cant level. 

c. There were no significant differences 
among diagnostic groups in the effectiveness 
with which IQ could be judged from the psy- 
chograms. 

3. Additional comparisons. 

a. The accuracy of the judges was signifi- 
cantly greater (p= .01) on the protocols 
than on the psychograms. 

5. For both protocols and psychograms, ac- 
curacy of prediction did not differ for the 
different IQ levels. In this respect, no differ- 
ences were found between judges or diag- 
nostic groups. 


Discussion 


As has been described above, two ap- 
proaches were made to the problem of the 
accuracy with which level of intellectual func- 
tioning ‘could be predicted from the Ror- 
schach. One was an objective, statistical ap- 
proach, and the other a judgmental approach. 





The question might well be asked as to why 
the objective analysis was even attempted, 
since leaving the clinician out creates an 
artificial situation. The enumeration by Beck 
(3), Rorschach (7), Klopfer and Kelley (6), 
and others of discrete Roschach scoring ele- 
ments as predictive of an individual’s intelli- 
gence tends to create the impression that these 
elements can be used in a mechanical way 
to estimate intellectual level. Therefore, the 
objective analysis was made in an attempt 
to determine whether these elements, either 
singly or in various constellations, are valid 
predictors of intelligence in the absence of the 
integrations furnished by the clinician. In the 
present study, the attempt to directly relate 
single Rorschach variables to intelligence was 
unproductive. Even the attempt to allow for 
more complex combinations of determinant 
relationships by means of the multiple regres- 
sion equation failed to yield useful estimates 
of intelligence. A further approach utilizing 
those Rorschach variables most highly related 
to IQ in an effort to establish cutting scores 
for the prediction of intelligence was not pro- 
ductive despite the use of a number of dif- 
ferent patterns of weighting. Conceivably, an 
approach employing more comprehensive con- 
figurations along the lines proposed by Cron- 
bach (4) and others would be necessary to 
establish meaningful findings. It seems doubt- 
ful, however, if any approach simply using 
traditional objective scoring categories would 
be fruitful, at least from the standpoint of 
individual prediction. 

In contrast to the objective treatment, the 
judgmental approach, of course, constituted 
an analysis which included the integrating 
factor of the clinician. This approach in- 
volved separate evaluations of the protocol 
and psychogram in order to try to weigh the 
extent to which the relatively objective fac- 
tors in the psychogram and the relatively 
subjective factors in the protocol determined 
the clinician’s judgment of functioning intelli- 
gence. A comparison was thus permitted on 
the one hand between the effectiveness with 
which the psychogram and protocol can be 
used for predicting intelligence, and on the 
other hand between each of these and the 
findings from the objective part of the study. 
The findings showed accuracy of prediction 


Predicting Intelligence from the Rorschach 








to be in the (descending) order of protocol, 
psychogram, and objective analysis. 

Certain factors appear logically to account 
for these findings, i.e., to explain why greater 
accuracy of prediction was obtained using 
just the psychegrams in the judgmental ap 
proach than was obtained in the objective 
analysis, and also why estimates based on the 
protocols were more accurate than either of 
these. It is to be recalled that each judge was 
asked to list those factors on which he felt 
his estimates of intelligence to be based. In 
the case of the psychograms, the factors listed 
by the judges included all of those tradition 
ally utilized for the prediction of intelligence, 
such as M, W, R, number of content cate 
gories, etc. However, although both the judges 
and the objective analysis appeared to utilize 
the same factors, the judges all tended to at 
tain greater accuracy. This suggests that the 
clinician makes use of these factors in a some 
what different way than can be accomplished 
through the objective analysis. It seems prob 
able that he may be able to assign more subtle 
weightings to constellations of these factors 
than was possible in the objective aspect 
Furthermore, the clinician probably capital 
izes on inferences from such additional, sub 
jective factors as (a) the use of specific con 
tent categories (e.g., science), (6) the kinds 
of blends utilized, and (c) the apparent pres 
ence of extreme anxiety. 

Explanations for the greater accuracy found 
with protocols are probably quite obviou 
The factors consistently listed by all of the 
judges as bearing the most weight in their 
predictions from the protocol were the quality 
of vocabulary and of perceptual organization 
From the present design it is not possible to 
determine whether these factors alone a 
count for the degree of accuracy of predic 
tion from the protocols, or whether a con 
tribution was made by other factors such as 
e.g., implicitly scoring a record in the process 
of reading through the protocol. Additional 
research is planned bearing on this problem 
of isolating the degree to which various fac 
tors are utilized by the clinician in his pre 
diction of intelligence from the Rorschach 

An important aspect of assessing the a 
curacy with which clinicians can predict fun 
tioning intelligence from the Rorschach is the 





328 


reliability of their judgments. In the present 
study, two reliability measures were em- 
ployed: interjudge and intrajudge reliability. 
The interjudge reliability coefficients were on 
the order of + .70 for both protocol and psy- 
chogram judgments. It would ordinarily be 
expected that intrajudge reliability would be 
slightly greater than interjudge. In the case 
of the psychograms, this was true. With the 
protocols, however, the intrajudge reliability 
coefficients were rather low. They were con- 
siderably below both the interjudge coeffi- 
cients for the protocols and the intrajudge 
coefficients for the psychograms. There are 
probably at least two factors which account 
for the low intrajudge reliability with the 
protocols. For one thing, it seems probable 
that the greater consistency in the case of the 
psychograms may have occurred because in 
rating them the judges tended to employ rela- 
tively fixed, objective criteria, whereas in rat- 
ing the protocols they tended to use more 
flexible, intuitive “hunches” based on such 
things as quality of percepts and vocabulary. 
Secondly, the factor of impatience or boredom 
may well have played a role. Each judge 
rated 120 protocols and 120 psychograms. At 
the conclusion of this initial task, they were 
again given 40 of the protocols and 40 of the 
psychograms to be rejudged. In judging the 
psychograms, the judges tended to employ 
more or less fixed frames of reference, which 
could be rapidly and painlessly applied. How- 
ever, judging the protocols, which involved 
rereading of records, some of which required 
some effort to decipher, was a more laborious 
task. It seems highly probable that the re- 
judgments on which the intrajudge reliability 
figures are based were made more hurriedly 
and with less deliberation than those em- 
ployed in the original judging procedure. 
This may be reflected in the finding that the 
percentage of correct placements (within 10 
points of the criterion) for the group of rec- 
ords that was rejudged tended to be lower 
(though not significantly so) than for the 
whole sample judged originally. Further re- 
search seems indicated in an attempt to ex- 
plain the low intrajudge reliability for the 
protocols and to determine whether more 
rigorous frames of reference can be estab- 
lished to enhance the accuracy of prediction. 


S. G. Armitage, P. D. Greenberg, D. Pearl, D. G. Berger, and P. G. Daston 


The development of such more rigorous 
frames of reference would in turn depend 
upon research designed to measure the weight 
or importance attached by the judges to such 
factors as vocabulary and quality of percep- 
tual organization. 

It is important to note the fact that there 
were individual differences among the judges 
and to recognize the implication of these inter- 
rater differences for clinical practice. Instru- 
ments such as the Rorschach have to be used 
by clinicians. If clinicians do not use the in- 
strument in the same way, how much confi- 
dence is it possible to have in the judgments 
of a particular clinician? Generally, the judges 
showed consistency among themselves. It was 
only in the case of the neurotic group that one 
judge did significantly less well than another. 
It is difficult to say whether this is a func- 
tion of the particular diagnostic group in- 
volved, or is the idiosyncrasy of the judge, 
or is simply a chance finding. Again, this 
might be an area for further research. 

With respect to the question of the effec- 
tiveness with which individual judges can 
estimate intelligence from the Rorschach, it 
might be asked what type of clinician was 
used for judges in this study, and why. Well- 
known Rorschach authorities were not em- 
ployed, inasmuch as our focus of interest was 
on the performance of the journeyman clini- 
cian. That is, the question to which we de- 
voted ourselves was not how well a few 
clinicians can estimate intelligence from the 
Rorschach, but rather, how well the bulk of 
clinicians can do. 

It remains to summarize and evaluate the 
implications of the findings. Do the estimates 
of intelligence obtained from the Rorschach 
conform closely enough to the Wechsler- 
Bellevue scores so that the Rorschach could 
be substituted for the W-B for the prediction 
of intelligence? Our findings would indicate 
that, certainly for the purpose of individual 
prediction, the Rorschach estimates cannot be 
substituted for the W-B scores. Our results 
do suggest, however, that the Rorschach 
protocol, in the hands of the practicing clini- 
cian, does, in the majority of cases, permit a 
fairly accurate estimate of intellectual func- 
tioning. While the frequency of accurate esti- 
mates is not great enough for individual pre- 








Predicting Intelligence from the Rorschach 


diction, it does permit prediction with some 
degree of accuracy on a group basis. This is 
at a level highly significant beyond chance 
expectancy * and suggests further avenues of 
exploration into the utility of the Rorschach 
for the prediction of intelligence: 

1. An attempt to identify factors respon- 
sible for the accuracy of prediction which we 
do obtain. 

To what extent, for example, is the clini- 
cian actually utilizing the Rorschach for his 
prediction of intelligence from the protocol 
and to what extent is he simply utilizing ver- 
balization which might be tapped by other 
methods. Thus it would be possible to design 
a study in which intelligence is estimated (ca) 
from lists of key words and phrases ab- 
stracted from the Rorschach protocol, (0) 
from interviews, (c) from interview plus Ror- 
schach, and (d) from other projective data, 
such as TAT protocols. 

2. A systematic qualitative analysis of the 
individual records, particularly the protocols, 
in an attempt to isolate factors which may be 
related to the ability to predict intelligence. 
This might involve such questions as: Do 
judges uniformiy do poorly in the case of 
bland, unproductive records? Is the judge 
influenced one way or the other by the esti- 
mate of severity of illness which he implicitly 
forms from the record? 

3. Research related to the question of in- 
tellectual potential. In certain cases where the 
clinician’s estimates are inaccurate, is this 
possibly a product of the fact that he is 
judging potential rather than functioning in- 
telligence? More comprehensively the ques- 
tion might be: How accurately can the cli- 
nician estimate intellectual potential? It is 
apparent that no adequate criteria for intel- 
lectual potential are available to us. Conse- 
quently, the most productive approach would 
be a longitudinal study, wherein estimates 
of intellectual potential from the Rorschach 


* These findings are contingent upon a clarification 
of the present reliability findings. As the results stand 
at present, the validity of even group predictions 
would be open to question. 


329 


would be compared with supposed premorbid 
measures of intelligence obtained during the 


childhood of the individual. 


Summary 


The problem of the accuracy with which 
level of intellectual functioning can be pre- 
dicted from the Rorschach was investi 


in two ways. One was 


an objective, statis 
approach and the other a juds 
The 


ol wen ‘ 
singie or composite 


rmental ap- 
proach. attempt to relate, statistically, 


scoring Varfi- 


Rorschach 


ables to Wechsler-Bellevue intelligence failed 
to yield useful estimates of intelligence. Cli- 
nicians, using just the Rorschach psycho- 


grams, tended to attain somewhat greater ac- 
curacy of prediction. When, however, the cli- 
nicians judged the Rorschach protocols, fairly 
accurate estimates of intellectual functioning 
were obtained. 

regarding the 


Findings 


reliability of the 
] 


judgments and individual differences among 


the judges in the accuracy of their estimates 
were discussed. Some areas for further re- 
search were delineated. 
Received December 13, 1954 
References 
1A W. D., & Altus, G Tr. R é 
it vari s and intelli - 
m. soc. Psychol., 1 , 47, 531-5 
2. Al W. D., & Thor Gra M. 7 } 
S¢ S$ a measu f intelligen J NSU 
I hol., 1 13 41-347 
3. J. Rorschach’s test. Vol. 1. B pr 
esse } New York: G & St 
ton, 1950. 
4. Cronbach, L. J. Statistical methods for multi 


ore tests. J. clin. Psychol., 1950, 6, 21 
5. Gibby R. G. Examiner influence on the Rorschach 
inquiry. J. consult. Psychol., 1952, 16 


a5 


6. Klopfer, B., & Kelley, D. M. The Rorschach tech 


nique. Yonkers, N. Y.: World Book Co., 1942 
7. Rorschach, H. Psychodiagnostics. Berne: Huber, 


1942. 

8. Tucker, J. E. Rorschach human and other move- 
ment responses in relation to intelligence. J 
consult. Psychol., 1950, 14, 283-286 

9. Wishner, J. Rorschach intellectual indicators in 

neurotics. Amer. J. Orthopsychiat., 1948, 18, 

265-279. 








Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Validity of the Grassi-Fairfield Block Substitution 
Test in Differential Diagnosis’ 


Pearl Harris * 
Trenton State Hospital 


This study was undertaken to test the va- 
lidity of the Grassi-Fairfield Block Substitu- 
tion Test (GFBST) and to ascertain whether 
or not a relationship exists between Grassi 
Test performance and intelligence. 

The experimental group consisted of 26 or- 
ganic patients. The control groups consisted 
of 8 psychotic and 12 nonpsychotic patients. 
The GFBST was administered using the di- 
rections and norms suggested by Grassi: (0— 
16 points, organic; 16-18, schizophrenic). 
Subsequently each patient was given a Wech- 
sler. 

Of the 26 organics, 11 (42%) obtained a 
score of 16 or higher, while 17 (85%) of the 
20 controls scored 16 or higher. A chi-square 
test of the significance of the difference 
yielded a p of .006. A comparison of the 
means of the two groups also revealed a sig- 
nificant difference (¢ = 3.53; p < .01). How- 
ever, when a comparison of the means of the 
organic and psychotic groups was made, the 
difference was not significant (¢ = 1.21; p= 
43). 

The average scores of the organics of the 
present study (13.7) were somewhat higher 
than the average of Grassi’s group (10.2) and 
the range (0-23) greater than Grassi’s (2.5- 
16). Whereas Grassi stated that deteriorated 
alcoholics fall in the middle range (16-20), 
in the present study 5 of the 6 alcoholics who 


1An extended report of this study may be ob- 
tained without charge from Pearl Harris, State Hos- 
pital, Trenton, N. J., or for a fee from the Ameri- 
can Documentation Inst. Order Document No. 4626 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to Chief, Photoduplication Service, Library of Con- 
gress. 

2 Now with the Institute of the Pennsylvania Hos- 
pital. 


had been diagnosed as A.B.S. or C.B.S. (and 
who were included in the organic group), 
scored below 16. 

The scores of Grassi’s psychotic group 
ranged from 16—26 whereas those of the pres- 
ent study ranged from 7-22.5. This differ- 
ence may be accounted for in part by the fact 
that Grassi’s psychotics were all schizophren- 
ics whereas the present study included other 
forms of psychotics. 

The GFBST correctly differentiated 70% 
of the subjects as organic or nonorganic. This 
was below the 93-96% accuracy indicated by 
Grassi’s statistics. The percentage of patients 
correctly identified as organics by the test 
was 58% which is markedly below Grassi’s 
claimed 90%. It correctly excluded 62.5% of 
the psychotics from the organic range (71% 
if those with below 70 IQ are excluded) as 
compared with Grassi’s 94%. As in Grassi’s 
study, 100% of the nonpsychotic group were 
excluded from the organic range. 

All 3 organics with IQ’s of 110 or better 
scored above 16, suggesting that intelligence 
is a factor. However, since the number of sub- 
jects was small, further study is needed. 

In conclusion, significant differences were 
found between groups of organics and non- 
organics but since considerable overlap oc- 
curs between scores of organics and psychot- 
ics, the test cannot be used with confidence 
for differential diagnoses between such pa- 
tients. With nonpsychotic individuals, the 
test is highly reliable in detecting organicity. 
Results also suggest that high IQ’s influence 
test scores. 

Brief Report 
Received May 16, 1955. 


Reference 


1. Grassi, J. R. The Fairfield Block Substitution 
Test for measuring intellectual impairment. 
Psychiat. Quart., 1947, 21, 474-489. 


330 








Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


The Rorschach as a Means of Predicting 
Treatment Outcome’ 


Gordon Filmer-Bennett 
Norfolk (Nebraska) State Hospital 2 


The evidence to date for specific Rorschach 
indices which might reliably and consistently 
serve to predict treatment outcome appears 
inconclusive (3). Predictive indices proposed 
by one study have, with few exceptions, 
failed to find corroboration in another. 

The present investigation, an outgrowth of 
an earlier one (1), was designed to test the 
possibility that a more global Rorschach ap- 
praisal might lead to greater accuracy in pre- 
diction, thereby implying the presence of cer- 
tain “hidden” cues. That is, if the Rorschach 
constitutes a basis in and of itself for predict- 
ing response to treatment, such should be re- 
vealed by comparative evaluations of the total 
protocols of patients with contrasting treat- 
ment outcome. 


Subjects 


The Rorschachs under study were obtained 
from 22 inpatients, admitted to the Univer- 
sity of Pittsburgh’s Western Psychiatric In- 
stitute, half of whom later responded favor- 
ably to treatment and half of whom did not. 
These patients represented a variety of diag- 
noses as determined by staff conferences held 
shortly after admission, but did not include 
those in whom organic disorder was suspect. 
Several among the patients had received prior 
treatment elsewhere, their inclusion in the 
sample being contingent upon such treatment 
having been concluded a minimum of eight 
weeks before the commencement of psycho- 


1The author is indebted to Dr. Gerald R. Pascal 
for his original suggestions, and also to those who 
so generously gave of their time in judging the 
protocols. 

2This study was completed at the Lincoln State 
Hospital, Lincoln, Nebraska. 


logical testing. In any case, all patients un- 
derwent some form of therapy at the Institute 
subsequent to the psychological examination, 
such therapy consisting of either psychother- 
apy or shock therapy (e.g., electroshock or 
insulin or both), or occupational therapy 
alone, or a combination thereof. 

Favorable or unfavorable response to treat- 
ment was judged on the basis of follow-up 
studies conducted on an average of more 
than two years beyond termination of hos- 
pitalization. Improvement signified a con- 
tinuously satisfactory vocational and social 
adjustment after leaving the hospital. Failure 
to improve defined those patients who were 
later returned to the hospital or to some 
other mental institution and who still re- 
mained hospitalized at the time the follow-up 
study was made. All instances of questionable 
improvement were omitted, as in the case of 
those who failed to maintain an adequate ad- 
justment both vocationally and socially or 
those who remained outside the hospital more 
by reason of a reduction of environmental de- 
mands than because of an actual change in 
behavior. 


Procedure 


The 22 pretreatment Rorschachs were paired 
according to an individual matching of pa- 
tients who later improved with patients who 
failed to improve, totaling 11 pairs in all. 
Both patients in seven of the 11 pairs were 
diagnosed as schizophrenic, in two of the pairs 
as manic-depressive, and in the final two pairs 
as psychoneurotic.* Further bases for match- 


8 Identical matching in terms of diagnostic sub- 
categories (e.g., catatonic with catatonic) 
tained in seven of the 11 pairs. 


was ob- 


331 









332 













































ing, in addition to diagnosis, were age, sex, 
marital status, intelligence, education, chro- 
nicity, and type of therapy (shock or non- 
shock). Mean age of the improved group was 
32.6 years (SD, 8.0), and of the unimproved 
group 34.0 years (SD, 7.9), while intelligence 
ratings of all patients in the two groups fell 
within approximately the average range. Sta- 
tistical differences between the two groups on 
these and all other variables were negligible, 
the resultant probability values being .40 or 
greater. 

Next, the Rorschach protocols were coded 
and randomly arranged within each pair. 
They were then presented, together with in- 
structions and a questionnaire blank, to 12 
psychologists who were asked to select out 
the “improved” from the “unimproved” pro- 
tocol in each pair and where possible to give 
reasons for the choice. No information 
given other than the sex, marital status, and 
age range for each pair. All psychologists had 
received the Diplomate in Clinical Psychol- 
ogy or possessed the training and experience 
necessary to qualify them for the ABEPP ex- 
aminations. 


was 


Results 


A frequency distribution describing the ac- 
curacy with which the 12 psychologists judged 
the paired protocols is shown in Figure 1. It 
points to an average of 6.0 pairs judged cor- 
rectly, with an accuracy range of four to nine 


No. correct 
judgments --- 














Gordon Filmer-Bennett 


Grouping on rating scale 
for each member of pair 


Improved Unimproved 





(N=12) 
o 
| 





OF JUDGES 


NO- 





Ee RE OR eB ae 


A... 2 es Met i te see 
NO- OF PAIRS JUDGED CORRECTLY 


Fig. 1. Accuracy 


y of prediction. 

pairs. Inspection suggests an approximation 
to the binomial distribution curve, indicating 
the probable role of chance factors. Only one 
psychologist achieved better than chance ex- 
pectancy (fp = .03), with a total of nine cor- 
rect estimates out of a possible eleven. Two 
psychologists correctly judging seven pairs 
agrees with what would be expected on a 
chance basis. 

The relative accuracy with which the indi- 
vidual pairs of protocols were judged is indi- 
cated in Table 1. It will be noted that all 
participants judged one pair correctly while 
failing completely in the case of another pair. 
Comparative study of the objective features 
of all paired protocols, together with an ex- 
amination of returned questionnaire 
forms which listed reasons behind the respec- 


those 


Table 1 


Judgments for Each Pair Compared with Relative Groupir 


on the Prognostic Rating Scale 


Direction of change as between 
paired groupings on rating scale 





Pair (maximum 12) Positive Neutral Negative 

A 12 Ill V . 

B il Ill IV . 

+ il II V ° 

D 10 Ill VI ° 

E 8 Ill Ill ° 

F 8 IV Il . 
G 5 Ill ITI . 

H 3 Ill II ° 

I 3 Ill III ° 

1 V - 


Rorschach as Means of Predicting Treatment Outcome 333 


tive judgments,‘ failed to reveal any consist- 
ent pattern or criteria which might form a 
basis of explanation as to why some pairs 
were more accurately judged than were others. 
Nor was there any apparent relationship be- 
tween accuracy of judgments and such vari- 
ables as chronicity or type of treatment given. 
Accordingly, it was decided to apply the Ror- 
schach Prognostic Rating Scale (2, pp. 688- 
699) in an effort to uncover possible correla- 
tives of the judgments obtained. 

shows that each of 
the four pairs judged most accurately also at- 
tained a higher scoring of the “improved” 
over the “unimproved” protocol on the Rat- 
ing Scale, described here as a change in the 
positive (i.e., expected) direction. Each of 
the remaining seven pairs which were judged 
less accurately showed either the same scor- 
ing or a lower scoring of the “improved” as 
against the “unimproved” protocol on the 
Rating Scale, labeled respectively as a neu- 
tral or a negative change of direction. This 
seeming 


Reference to Table 1 


relationship between the psycholo- 
gists’ judgments and relative groupings on 
the Rating Scale suggests that the judgments 
re relied in part upon criteria used in 
potential adjustment which the 
Scale purports to measure. 

Discussion 

The above findings suggest certain limita- 
tions in the prognostic effectiveness of the 
Rorschach, as we currently understand it. 
Further clues to the nature of these limita- 
tions were contained in the written reports of 
10 of the 12 participants in this study. Analy- 
sis of their content suggested that the basis 
for judgment varied from psychologist to psy- 
chologist and, to a lesser extent, from pair to 
pair for a given participant. 

While, in general, the majority voiced re- 
liance upon content-oriented analysis, several 
claimed to weighted formal scoring 
heavily in arriving at their conclusions. Five 
based their judgments in part upon the in- 
ferred nature or severity of the behavior dis- 
order, or sometimes upon both. At the same 


have 


+ Of the forms returned, six described in detail the 
criteria used in evaluating each pair, while an addi- 
tional four outlined the general method of approach 
used. 


time, three psychologists explicitly denied 
using criteria common to all pairs, an equal 
number claimed their standard of judgment 
to be the throughout, while the re- 
mainder gave no clear indication either 


same 
way. 
On the other hand, the fact of a certain con- 
sistency in the accuracy with which some 
pairs were judged, together with the relation- 
ship to direction of change between paired 
groupings obtained on the Prognostic Ratin 
Scale, suggest that the psychologists in ques- 
tion may actually have utilized more criteria 
in common in arriving at their decisions than 
would seem apparent from a scrutiny of the 
i This seeming discrepancy 


reports submitted. 
between the empirical findings and the writ 
ten statem 


lents is not surprising since in a 
number of instances the psychologist r 

considerable difficulty in conveying i 
many words the precise reasons for conclud- 


ing as he did. As one psychologist put it, the 


final judgment in each instance was more a 
product of “intuitive feel” derived from a 
Lawns! = re Pos _— 
consideration of a multiplicity of periorm- 


ance 


factors than it was of any clearly defini- 
tive aspect of the protocol. 

ry and Conclusi 

y and Conclusions 

Twelve well-qualified 


quired to choose 


psychologists were Te- 
which of two paired pretreat 
ment Rorschachs in each of 11 such pairs w 

given subsequently 


by a patient who 
proved and which by a patient who re 

unimproved. Pairing of protocols was on t 
basis of an individual matching of 


€ )T 


ry 


a. ine 


I 
ality variables derived from clinical dat 
results lead to the following tentative conch 
sions: 

1. Most clinical psychologi ts who are re- 
quired to differentiate Rorschach protocols on 
a prognostic basis tend, in the long run, to do 
little better than chance. 

2. There greater-than 
chance consistency among psychologists as 


a 


appears to be a 


regards the accuracy with which such differ- 
entiation is accomplished in the case of some 
protocols. 

3. This consistency cannot be explained in 
terms of specific criteria that are objectively 
or subjectively identifiable, although evidence 
for inferring some communality in method of 
approach is suggested. 








334 


These data lend support to certain previous 
findings that the Rorschach, in our present 
state of knowledge, is inadequate as a sole 
measure of how a patient will react to treat- 
ment. However, consideration should be given 
to the limited control over type of treatment 
in the present study and to the consequent 
need for re-examining the Rorschach’s prog- 
nostic effectiveness under conditions which 
focus exclusively upon the use of, say, psy- 
chotherapy or one of the shock therapies. 


Received December 20, 1954. 


1 


3 


Gordon Filmer-Bennett 


References 


. Filmer-Bennett, G. Prognostic indices in the Ror- 


schach records of hospitalized patients. J. ab- 


norm. soc. Psychol., 1952, 47, 502-506. 
Klopfer, B., Ainsworth, Mary D., Klopfer, W. G., 
& Holt, R. R. Developments in the Rorschach 


technique. Vol. 1. Technique and _ theory. 


Yonkers-on-Hudson: World Book, 1954. 
. Windle, C. 
logical 
451-482 


Psychological tests in psychopatho- 
Psychol. Bull., 1952, 49, 


prognosis. 


Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Differential Responses of Normals, Psychoneurotics, 
and Psychotics on Rorschach Determinant Shift’ 


Bernard A. Stotsky 


Veterans Administration Hospital, Brockton, Mass. 


The findings of previous investigations (2, 
3, 4, 5) have demonstrated that determinant 
shift, a method for measuring changes which 
occur from the free association to the inquiry 
of the Rorschach, has value as an experimen- 
tal technique for research with the Rorschach 
test. Significant differences for determinant 
shift have been found between psychotics and 
psychoneurotics (4), between patients who 
remain in psychotherapy and those who ter- 
minate prematurely,? and between halluci- 
nated and deluded patients (5). Those pa- 
tients who shift more than others are re- 
garded as more reactive to the stimulus in- 
troduced by the examiner (4, 5). 

In making these comparisons between 
groups, the question was raised concerning 
possible differences in performance between 
normals and psychotics and between normals 
and psychoneurotics. What differences could 
be expected? 

Two alternate interpretations of determi- 
nant shift have been offered. One is that the 
amount of shift is a rough measure of reac- 
tivity of the subject to a change in the stimu- 
lus (4, 5). Lack of shift is interpreted as in- 
dicating a lack of responsiveness to the change 
in cues provided by the examiner. More sensi- 
tive subjects should show greater shift than 
less sensitive subjects. According to this in- 
terpretation of shift, normals and psycho- 
neurotics should show greater shift than psy- 
chotics. The differences between normals and 
psychoneurotics should be relatively small 
compared with the differences between either 
group and psychotics. 

The second interpretation is that shift in 


1From VA Hospital, Brockton, Massachusetts. 
2L. Salk, personal communication. 


response to a change of instructions on the 
Rorschach is a measure of adaptability or 
flexibility (6). Lack of shift is a sign of in- 
ability to respond effectively to a changed 
situation. According to Hutt, Gibby, Milton 
and Pottharst, “capacity to shift can become 
an important differentiating criterion of men- 
tal health. The more pathological the indi- 
vidual, the lower would the score in capacity 
to shift then be” (6, p. 186). If this is so 
normals would be expected to show greater 
shift from free association to inquiry than 
either psychoneurotics or psychotics. 

The purpose of this study was to deter- 
mine which of these two interpretations of 
shift was more tenable when the type of shift 
being studied was that of determinants. 


Method 


There were three groups: a normal, a psy- 
choneurotic, and a psychotic group. The 
mal group consisted of 20 subjects 


nor- 
selected 
from a large number of applicants for the po- 
sitions of hospital aid, kitchen worker, and 
custodian. Twenty psychoneurotic outpatients 
at a VA mental hygiene clinic, who had taken 
Rorschachs during diagnostic screening by 
the psychiatric team, made up the psycho- 
neurotic group. The psychotics were selected 
from among inpatients at a VA hospital who 
bore a diagnosis of a functional psychotic dis- 
order. To insure comparability of groups a 
number of precautions were observed in sam- 
pling which will be discussed below. 

The three samples had to be homogeneous 
with respect to such background variables as 
age, education, occupational level, race, and 
sex, so that whatever differences were found 
could not be attributable to such 


factors 


335 








336 


Care also had to be taken to eliminate from 
all samples people with known organic pa- 


thology of the brain and, from the outpatient 
sample, ambulatory psychotics, character dis 
orders, and other nonneurotics. In all in- 


stances the final psychiatric diagnosis was 
accepted as the criterion for diagnostic clas- 
sification. 

In selecting normals the following criteria 
were applied: 

a. No previous psychiatric treatment or 
hospitalization for psychiatric reasons. 


6b. No disability resulting from nervous- 
ness. 

c. No record of arrests or court appear- 
ances for breaking the law, except for 


speeding. 

d. No history of alcoholism reported. 

e. A score on the Taylor Manifest Anx- 
iety Scale below 20. 

f. A score on the Manson Evaluation Test 
below 21. 

g. Rating of adequate personality adjust- 
ment 40-minute interview 
by the applicant’s prospective employer. 


following a 


h. Satisfactory performance on the job for 
at least 60 days. 


The first four criteria are self-explanatory. 
With regard to the Taylor scale subjects with 
scores in excess of 19 were eliminated since 
less than 23 per cent of the normals studied 
by Taylor (8) gave scores in excess of 19, 
while 82 per cent of the psychiatric patients 
On the Manson a score of 21 or greater 
is regarded as indicative of severe psycho- 
neurotic and psychopathic traits usually asso- 
ciated with alcoholism (7). The job interview, 
the seventh criterion, was taken by all sub- 


did. 


Bernard A. 


Stotsky 


jects prior to psychological testing. The chief 
of the service o1 
would conduc 


omeone delegated by him 
t a rather lengthy and searching 
| 


interview to determine the applicant's fitness 


for his Wwol 


mental 


and for working closely with 
The 
was employed to provide external confirma- 
tion of the judgments derived from historical 


test scores, 


patients. performance criterion 


data, objective and the interview. 
During the 60-day trial period the employees 
were observed and rated with respect to com- 
petency, person 
ity, and ability to work with patients 
those rated 


il stability, honesty, reliabil- 
On!) 
were included in this 
sample. To match the psychoti and psycho- 


y 


satisfac tory 


neurotic san males were in 
cluded from among the applicants. These 


rather extensive 


pl 5. only white 
criteria for selecting normals 
involved sampling 
20 suitable « 


To contre 1o! 


over 100 applic ints before 
es could be found. 

variability possibly intro 
Ror- 


duced by differences in the number of 


schach resp s (R), the three groups were 
matched with respect to R. After further sam 
pling, this v 
R for the tl 
for psychoti 


finally accomplished. The total 

ee groups differed negligibly, R 

being 372, for psychoneurotics 
37 


l. 


of the three groups for age, 


369, and for normals 


Compariso! 


educati pation, and IQ scores for cases 


where test data were available showed no sig- 


ences. 
7 
ici 


nificant diffe 


The R ch determinants investigated 


were pure form (F), total shading (7SH), 
and total color (Tot. C) responses. While 
scores for individual Rorschach determinants 


} 


were available, the skewed distributions re- 


sulting from the large number of O scores for 
all determinants but FY made analysis of the 


Table 1 











Comparison of Normals, Psychoneurotics, and Psychotics rschach Determinant Shift 
Difierences for means 
Means —---- ——-— 
——__—_—— —_—_——_—_— Norma Neurotic- Normal- 
Determinant Normal Neurotic Psychotic Psychotic f Psychotic tf Neurotic ?# 
Pure form 4.10 4.55 2.35 1.75 Ey 2.20 4.0** A5 a 
Total shading 2.05 2.55 A 1.45 © mg 1.95 4.6** .50 8 
Total color 2.00 1.60 1.35 65 1.4 Ey 8 40 1.0 








* Differences significant at the .05 level. 
** Differences significant at the .01 level. 


Differential Responses on Determinant Shift 


data by the ¢ test impossible. Later analysis 
by means of the chi-square test indicated 
that for individual de- 
terminants did not add appreciably to the in- 
terpretation of the findings obtained from the 
three determinants listed above. Only scores 
for these three will be reported here. 

Beck’s (1) scoring techniques were em- 
ployed. The method, previously described (2, 
3), of scoring first the free association by it- 
self and then the total response including 
both the free association and the inquiry was 


differences obtained 


utilized. To control for examiner variability, 
no examiner was allowed to contribute more 
than seven records to the total sample. Alto- 
gether records obtained by 21 examiners were 
used. All with identifying data re- 
moved, were rescored by the author. Reli- 
ability in excess of .87 was obtained for his 
shift for ten records. One modifi- 
cation was introduced in the scoring. For the 
so-called “blend” responses (such as FC.Y), 
shift for the primary 


re¢ ords. 
scoring of 


only determinant was 


scored. 
Results 


Normals vs. psychotics. Normals show sig- 
nificantly more shift than psychotics for F 
and TSH. For Tot. C the difference is not sig- 
iificant. 

Psychoneurotics vs. psychotics. Psychoneu- 
rotics show significantly more shift for F and 
TSH. For Tot. C the difference is not signifi- 
the 


and serve to validate the 


cant. These findings support those of 
earlier (4) 
previous findings. 

Normals vs. psychoneurotics. Although psy- 
choneurotics shift slightly more for F and 
TSH and normals more for Tot. C, none of 
the differences is significant. 


study 


Discussion 


The results support the interpretation of 
determinant shift as a rough measure of the 
subject’s sensitivity to a change in the test 
stimulus. The fact that the normal and psy- 
choneurotic groups do not differ significantly 
from each other, while differing significantly 
from psychotics on two of the three variables 
studied, seems to argue against an interpreta- 
tion of shift as a measure of flexibility. The 
form and shading variables which showed the 


337 


most shift in this study are also the ones 
which were found by Gibby (2) to be most 
sensitive influence. From these 
findings it can be tentatively concluded that 


to examiner 


normals and psychoneurotics react in a some 
what similar fashion to the change in the test 
situation, while psychotics react to a lesser 
degree than 


either normals or 


rotics. Interestingly enough, the three groups 
do not differ significantly with respect to 


shift for color. It may be that the diffe 


between groups in shift for sha and fo 
reflect differences in the type of interper- 
sonal relationship established with tl € 
aminers by the subjects. Nor l psy- 
choneurotics are prob ibly more attentive l 
relate more freely to the examiner psi 
chotics. Since shading is more susceptible to 
examiner influence than color, they may be 
better able than psy' hotics to elaborate their 
free-association responses in t juiry by 
the use of shading t not by the 

color. 


4 


Relation of shift to three external criteria 


Data pertaining to intelligence, manife 


I 
iety, and personal adjustment were a 


for the 20 normal subjects in our 

ee SR —— FS wa . 
For these normals, determina shift cor- 
related + .16 with prorated Wechslk IO 
scores, — .29 with Taylor manifest anxiet' 
scores, and + .26 with Manson maladiust- 
ment scores. None of thes é 
significant. In this study dete t 


showed no significant re 
external criteria of intelligenc: 


maladjustment, and anxiety. 


1. Three groups of 20 subjects, one con- 
sisting of psychotics, a second of ps née 


rotics, and a third of normals, mat 


total number of Rorschach responses (KR 
and homogeneous with respect to age, educa- 
tion, and occupation, were compared for R 


determinant shift from free ass 


ciation to inquiry. 


schach 
The three determinants 


studied were the number of pure form re- 


sponses, the total number of shading re- 
sponses, and the total number of color re- 
sponses. 

2. Normals and psychoneurotics showed 


significantly greater shift than psychotics for 








338 


pure form and shading responses. Normals 
and psychoneurotics did not differ signifi- 
cantly for any of the three variables. 

3. The findings were regarded as favoring 
an interpretation of determinant shift as a 
measure of the sensitivity of the subject to 
examiner cues, rather than as a measure of 
flexibility. 


Received January 31, 1955. 


References 


1. Beck, S. J. Rorschach’s test. Vol. I. Basic proc- 
esses. New York: Grune & Stratton, 1944. 

2. Gibby, R. G. Examiner influence on the Ror- 
schach protocol. J. consult. Psychol., 1952, 
16, 449-455. 


Bernard A. Stotsky 


3. Gibby, R. G., & Stotsky, B. A. The relation of 
Rorschach free association to inquiry. J. con- 
sult. Psychol., 1983, 17, 359-364. 

4. Gibby, R. G., & Stotsky, B. A. Determinant shift 
of psychoneurotics and psychotics. J. consult 
Psychol., 1954, 18, 267-270. 

5. Gibby, R. G., Stotsky, B. A., Harrington, R. W., 
& Thomas, R. W. Rorschach determinant 
shift among hallucinatory and delusional pa- 
tients. J. consult. Psychol., 1955, 19, 44—46. 

6. Hutt, M. L., Gibby, R. G., Milton, E. O., & 
Pottharst, K. E. The effect of varied experi- 
mental “sets” upon Rorschach test perform- 
ance. J. proj. Tech., 1950, 14, 181-186. 

7. Manson, M. R. A psychometric differentiation of 
alcoholics from nonalcoholics. Quart. J. Stud. 
Alcohol., 1948, 9 (2), 175-206. 

8. Taylor, Janet A. A personality scale of manifest 
anxiety. J. abnorm. soc. Psychol., 1953, 48, 
285-290 








Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 





Manifest Anxiety and Rorschach Performance in a 
Chronic Patient Population’ 


Leonard D. Goodstein 


State University of Iowa 


and Leo Goldberger * 


Mental Health Institute, Mt. Pleasant, Iowa 


As the concept of anxiety has grown in im- 
portance as an explanatory construct in con- 
temporary psychology, the problem of the 
measurement of anxiety has received consid- 
erable attention on the part of psychologists. 
The Taylor Manifest Anxiety Scale (A scale) 
(24) has shown substantial usefulness as a 
measure of anxiety and, consequently, drawn 
much of this attention. 

As Goodstein (9) has pointed out, the vali- 
dation studies of the A scale have been of 
three major, independent types. A number of 
experimental studies have used the A scale to 
measure an individual’s reactivity or excit- 
ability which, in turn, reflects his general 
drive level. These studies have predicted per- 
formance, using Hull’s (13) theoretical for- 
mulation relating response strength to drive, 
in a variety of laboratory experiments, in- 
cluding reaction time, conditioning, verbal 
learning, maze learning, as well as several 
more complex behavioral situations.* 

Secondly, a number of investigators have 
shown that the A scale is useful in identify- 
ing clinically-diagnosable anxiety states. Sev- 


1The authors wish to acknowledge the coopera- 
tion of Dr. W. B. Brown, Superintendent of the 
Mental Health Institute, Mt. Pleasant, Iowa, in the 
collection of the data for this study. The assistance 
of Dr. I. E. Farber in the preparation of the manu- 
script, and Mr. Vernon QO. Tyler in the statistical 
analysis of portions of the data, is also gratefully 
acknowledged. 

2Now at New York University. 

8 A mimeographed copy of a bibliography of stud- 
ies involving the A scale (through January, 1955) 
may be obtained upon request from the senior 
author. 


339 


eral studies (3, 19, 22, 23, 24, 25) have 
shown that patient populations give signifi- 
cantly higher A-scale scores than do normal 
populations. There are a number of studies 
(3, 8, 11, 12, 14, 16) that have shown A- 
scale scores to be significantly correlated with 
clinicians’ ratings of anxiety. Gallagher (7) 
has reported that changes in A-scale scores 
are positively correlated with various therapy- 
success criteria. 

The relationships between A-scale scores 
and scores on other tests purporting to meas- 
ure anxiety or other personality variables 
provide a third kind of validity study. Bech- 
toldt (1) and Brackbill and Little (2) have 
reported the intercorrelations between the A 
scale and the other MMPI scales while other 
investigators (7, 10, 16) have reported high 
positive correlations between A-scale scores 
and other empirically derived MMPI anx- 
iety indices (26, 28). Several studies (9, 11, 
27) have reported the relationships between 
A-scale scores and various Rorschach meas- 
ures of anxiety in normal populations with 
equivocal results. 

Although several of the investigations men- 
tioned above have used patients as Ss, these 
have been either outpatients or have been 
hospitalized for only a brief period. The hos- 
pitals used in these studies, moreover, have 
been either acute treatment hospitals or re- 
search hospitals with highly selected patient 
populations. The purpose of the present study 
is to investigate the diagnostic usefulness of 
the A scale with chronic patients in a typical 
midwestern state mental hospital. A second 


340 Leonard D. Goodstein and Leo Goldberger 


purpose is to investigate the relationships be- 

tween A-scale scores and various Rorschach 

indices of anxiety in such a population. 
Subjects and Procedures 

The subjects (Ss) were psychiatric patients 
hospitalized at the Mental Health Institute in 
Mt. Pleasant, Iowa. While every effort was 
made to choose a random sample of Ss from 
the total patient population of 1,300, some 
selective factors such as degree of disturb- 
ance, negativism, literacy, etc. were inevitably 
operating. 

The Taylor A scale was administered to 
166 randomly selected patients, but 27 Ss 
were eliminated as they had answered fewer 
than 45 of the 50 test items. This left a total 
of 139 Ss, 84 males and 55 females. 

The tests, in mimeographed form, were ad- 
ministered to the Ss in small groups of four 
to ten in the ward day rooms. An experi- 
menter (£) was present to answer any ques- 
tions, e.g., concerning vocabulary, recording 
of responses, etc., that might be raised. The 
instructions were phrased in very simple lan- 
guage and all Ss were assured that the test 
results would not affect their hospitalization 
in any way. 

Psychiatric diagnoses were available for all 
Ss as part of their hospital record. The Ss’ 
age range was from 18 to 85 years (mean = 
43.1, SD = 12.4) while the range of length 
of hospitalization was from less than one year 
to more than 37 years (mean = 4.6, SD= 
6.9). In addition, each S’s Rorschach record 
was available in the files of the Psychology 
Department of the Institute. 


Results and Discussion 
Manifest Anxiety and Psychiatric Diagnosis 


The A-scale means, the SD’s and the N’s 
of cases for each diagnostic group, by sex as 
well as the over-all means and SD’s, are pre- 
sented in Table 1. The over-all mean A-scale 
score for the 139 Ss is 16.9, which is much 
mean reported for college stu- 
dents than to that usually reported for psy- 
chiatric patients (3, 19, 23, 24, 25). These 
previous however, have typically 
used recent admissions or outpatients, while 


closer to thx 


studies, 


the mean length of hospitalization in the 
present investigation was four and a half 
years, indicating a much more chronic pa- 
tient population. A significant negative cor- 
19 (p < .01) was found in the 

between 


relation of 
present study 
length of hospitalization, suggesting that a 
lower anxiety level may result 
lengthy hospitalization. 

An analysis of A-scale scores by psychi- 
atric diagnoses revealed that the mean for the 
psychoneurotic Ss is significantly higher (p 
< .01) than the mean for any other diag- 
nostic group. There are no other significant 
mean differences among the diagnostic groups. 
The psychoneurotics had spent less time in 
the hospital (p < .01) than had the other 
patients, suggesting that the diagnosis of 
psychoneurosis is confounded with length of 
hospitalization, at least in the present study. 
There was not a sufficiently large group of 
either chronic neurotics or recently admitted 
nonneurotics to permit further investigation 
of this aspect of the problem. 


A-scale scores and 


from such 


Table 1 


Distribution of A-Scale Scores by Diagnosis and Sex 























Females Males Total 

Diagnosis N Mean SD V Mean SD N Mean SD 
Schizophrenia mi 125: 1002 39 154 8.6 63 16.2 9.3 
Chronic brain syndrome i. dae, 107 i7.. ace 7 ae * ae Fe 
Manic-depressive psychosis 5 16.0 6.2 3 70 3.9 8 12.6 6.6 
Involutional psychosis 8 164 9.9 2 95 7.6 10 15.0 9.9 
Psychoneurosis 8 27.5 9.4 7 25.4 8.7 15 26.5 9.1 
Behavior and character disorders 0 — — 146 144 79 16 144 7.9 
Total a, sap $5 84 15.1 8.9 139 16.9 9.8 

















2 en 


Nation 


OT aa A RS 


Manifest Anxiety and Rorschach Performance 341 


While the rather low A-scale means found 
in the present investigation may be surpris- 
ing, the finding that chronic patients, some of 
whom have spent most of their adult life in 
the hospital, are not as anxious as nonchronic 
patients who may be seeking psychiatric help 
for the first time, is certainly understandable. 
These present findings would suggest that the 
A scale is only of limited usefulness in typi- 
cal state hospital psychodiagnostic work, be- 
ing primarily valuable in the identification of 
nonchronic, psychoneurotic Ss. 

The over-all A-scale mean for women is 
significantly higher (p < .01) than the over- 
all mean for men. While this finding has been 
reported by Taylor (24), the difference be- 
tween the male and female means was not 
this reliable. The female mean is higher than 
the male mean for each diagnostic category 
that occurs in both sexes. This would suggest 
that women are more anxious than men re- 
gardless of diagnosis, and that the two dis- 
tributions should not be combined. 


Manifest Anxiety and Rorschach Anxiety In- 
dices 


Since the A scale has usually been found to 
be nondiscriminating over the middle range 


Table 


Comparison of 17 Rorschach Ir 





of scores, two groups of Ss (high anxious 
{Hi A] and low anxious [Lo A]) were se- 
lected from the extremes of the A-scale dis 
tributions for the two sexes. Each group con- 
sisted of 16 Ss, seven females and nine males 
whose A-scale scores fell, respectively, within 
the upper and lower 12 per cent of scores 
in each distribution. The cutting scores were 
as follows: Hi A males > 29; Hi A females 
> 33; Lo A males 7: Lo A females < 8. 
The psychoneurotics were excluded from this 
portion of the analysis to avoid confounding 
anxiety with psychiatric diagnosis. The per- 
formance of these two groups of Ss on the 
Rorschach was then compared. 

All Rorschachs had been scored following 
Klopfer e¢ al. (15), and the location, deter- 
minant, and content scores were then ex- 
pressed as a percentage of total responses 
The means and SD’s for 17 Rorschach 
dices for the two groups, as well as the ¢ 
values and probability estimates of the mean 
differences, are presented in Table 2. These 


particular indices were selected as typica 


Rorschach indices of anxie ly (15) or because 
they had been previously investigate: 
studies of this type (4, 11, 27). 

The data presented in Table 2 indicate that 


High anxious S Low al 
(N = 16 V 16 
Index Mean SD Mean SD 

Total no. of responses (R) 23.3 4 18.4 11.6 1.25 07 

Y% of whole (W) responses 25.3 17.1 41.8 23.1 2.22 

% of diffuse shading (KX) responses 4.9 4.2 8.6 11.3 1.16 t 
©, of surface shading (c) responses 9.6 8.9 3.4 3.7 2.55 t 
Y% of weighted Color (C) responses 11.5 9.3 8.2 78 1.06 

% of achromatic Color (C’) responses 1.7 2.8 6.1 9.7 1.67 t 
J of movement (M, FM, m) responses 19.1 15.2 24.2 7.2 0.85 

% of form-primary responses 17.9 12.3 16.1 14.7 0.37 

%, of form-secondary responses 10.6 8.5 8.8 8.5 0.6 

% of no form responses 3.5 5.8 1.7 3.3 1.04 

% of poor form (F—) responses 20.5 20.1 22.1 20.6 0.21 ns 

% of popular (P) responses 13.9 5.9 17.2 11.4 0.98 nsf 
Y of RCT Anxiety responses 41.9 24.5 26.3 20.6 1.89 O4T 
% of RCT Hostility responses 10.9 11.2 2.8 4.6 2.63 02T 
Y% of anatomical and sex responses 6.2 10.6 6.5 10.1 0.09 $ 

No. of reaction times > 15” 6.7 2.2 5.4 2.1 1.71 : 

No. of rejections 0.5 0.9 1.3 2.2 1.23 t 

* df = 30 in each case, except where indicated. 
t 15 df used in estimating / as the variances are heterogeneous 


t Using a single-tailed hypothesis. 





342 Leonard D. Goodstein and Leo Goldberger 


Table 3 


Comparison of the Significant Results of Three Investigations of Response-defined Anxiety and 
Rorschach Indices of Anxiety 














Investigator 








Cox & 


























se ME me Be ee 


RT Ne 





Goodstein & Westrope 
Rorschach Index Goldberger Sarason (4) (27) 
Total no. of responses Hi > Lo Hi = Lo Hi > Lo* 
Whole responses Lo > Hi Lo > Hi Lo = Hi 
Diffuse shading responses Hi = Lo Hi = Lo 
Surface shading responses Hi > Lo Lo > Hi Hi > Lo*t 
Movement responses Hi = Lo Hi > Lo 
% of poor form responses Hi = Lo Hi > Lo Hi = Lo 
RCT Anxiety responses Hi > Lo Hi > Lo 
Popular responses Hi = Lo Lo > Hi 
No. of reaction times > 15” Hi > Lo Hi > Lo 
* Westrope used a square-root transformation of these scores rather than the percentage transformation used by the others. 
Tt Westrope used Hertz's weighted shading sum which included both diffuse and surface shading responses (26, p. 517). 


the Hi A group, as compared with the Lo A 
group, give a significantly higher percentage 
(p < .03) of surface shading (c) responses, 
significantly higher percentages of Elizur (5) 
Rorschach Content Test (RCT) anxiety (p 
< .04) and hostility (p< .02) responses, 
and significantly more reaction times exceed- 
ing 15 seconds (p < .05). The Lo A group 
give a significantly higher percentage (p 
< .05) of whole (W) responses. The Hi A 
group also give a larger total number of re- 
sponses but this finding was significant at 
only a low level of confidence (p < .07). No 
other differences between the two groups are 
statistically significant. A further analysis of 
these data, using the Mann-Whitney U test 
(18) which does not necessitate the assump- 
tion of normality of the distributions, yielded 
results identical with those in Table 2. 

The present findings with regard to the 
Rorschach would suggest that the A scale 
and these Rorschach indices do not yield en- 
tirely similar measures of anxiety, at least in 
a chronic patient population, although both 
are response-defined measures. For instance, 
Klopfer et al. (15, p. 269), insist that re- 
sponses involving diffuse shading (K) are 
the important indicators of anxiety. In the 
present study, the Lo A group gave a higher 
percentage of K responses than the Hi A 
group, although the difference was not sig- 
nificant. In light of this finding, one can 
scarcely argue that both K responses and 


high A-scale scores indicate the presence of 
anxiety, at least not the same “kind” of 
anxiety. 

Although three previous studies (4, 11, 27) 
have compared the Rorschach performance of 
high and low anxious Ss, using response-de- 
fined measures of anxiety, the study by Holtz- 
man ef al. (11), did not use either the stand- 
ard Rorschach cards or follow the usual 
administrative procedures and, therefore, did 
not report comparable results. As Westrope 
(27) has pointed out, response-defined anx- 
iety, such as measured by the A scale is quite 
different from “stress-produced” anxiety and, 
consequently, the studies of Eichler (6) and 
others who have used stress-produced anxiety 
will not be considered here. 

The results reported by Cox and Sarason 
(4), and Westrope (27), are compared with 
the present findings in Table 3. While the 
earlier investigators have used college stu- 
dents as Ss, the obtained A-scale scores‘ from 
the psychiatric patients used in the present 
study are comparable to those for college Ss, 
suggesting that the results may be compared, 
although such comparisons require consid- 
erable caution. 


4While Westrope used the Taylor A scale as her 
response-defined anxiety measure, Cox and Sarason 
used the Test Anxiety Questionnaire (17, 20, 21). 
Dr. George Mandler ef Harvard University in a per- 
sonal communication, however, reports a highly sig- 
nificant correlation of .59 between scores on the Test 
Anxiety Questionnaire and scores on the A scale. 





=" 











Manifest Anxiety and Rorschach Performance 


While the three studies do not completely 
agree with respect to any single finding, sug- 
gesting that investigators in this area should 
regard their results as tentative prior to cross 
validation, certain consistencies do emerge. 
The present writers and Cox and Sarason have 
reported that Lo A Ss give more whole re- 
sponses than the Hi A Ss and Westrope’s re- 
sults, although not significant, are in the same 
direction. The Hi A group gave more RCT 
anxiety responses and more total responses in 
two of the three investigations. Cox and 
Sarason did not use the RCT but they did 
find that their Hi A group gave more total 
responses than their Lo A group, although the 
difference was not significant. Both of these 
findings have also been reported by Goodstein 
(9) who used a group presentation of the 
Roschach. The two studies that investigated 
heightened reaction times (> 15’) also re- 
ported that the Hi A Ss gave more such re- 
sponses than the Lo A Ss. 

Neither Cox and Sarason nor the present 
investigators were able to find any difference 
in the frequency of diffuse shading responses 
between the two groups although, as noted 
above, such responses have been accepted as 
the most significant Rorschach index of anx- 
iety. The numerous studies cited above re- 
porting significant relationships between A- 
scale scores and clinically diagnosable anxiety 
states provide sufficient evidence, at least for 
the present authors, for the validity of the A 
scale as a measure of clinical anxiety. The 
failure in these two studies to find signifi- 
cant relationships between diffuse shading re- 
sponses and A-scale scores casts considerable 
doubt on the interpretation of such shading 
responses as indicative of manifest anxiety. 
The findings with respect to surface shading, 
poor form, movement, and popular responses 
are rather equivocal and strongly suggest the 
necessity for additional research. Certainly 
many of the statements involving the rela- 
tionships between these indices and the pres- 
ence or absence of anxiety require re-exami- 
nation in terms of these findings. 


Summary and Conclusions 


The purposes of the present study were to 
investigate the diagnostic usefulness of the 
Taylor A scale in a chronic patient popula- 


343 


tion, and to study the relationships between 
A-scale scores and Rorschach anxiety indices 
in such a population. 

The A scale was administered to 139 Ss in 
a state mental hospital. The over-all mean A 
scale was comparable to that usually reported 
for college Ss, with only the psychoneurotic 
Ss giving a significantly higher mean. Some 
evidence that anxiety tends to be reduced 
with lengthy hospitalization was also found. 
A significant sex difference was also reported 
with the female A-scale means higher than 
the male means, regardless of diagnosis. 

Two groups of 16 Ss each, high and low 
anxious, were selected from the upper and 
lower 12 per cent of the A-scale distributions 
for the two sexes, and their Rorschach per- 
formances were then compared. The psycho- 
neurotic Ss were excluded from this portion 
of the analysis to avoid confounding anxiety 
and psychiatric diagnosis. The two groups 
were significantly different on 6 of the 17 
Rorschach indices investigated; some of these 
differences were, however, in direct contradic- 
tion to previously reported results. 

The present results were consistent with the 
conclusion that the A scale is of some useful- 
ness in typical mental hospitals in the differ- 
entiation of nonchronic, neurotic patients. It 
can be further concluded that only a few of 
the so-called anxiety indices on the Rorschach 
are related to response-defined measures of 
anxiety. The most reliable findings suggest 
that anxious Ss give more total Rorschach 
responses, more RCT anxiety responses, more 
reaction times exceeding 15 seconds, and 
fewer whole responses. These results would 
seem to indicate the potential usefulness of 
additional research with response-defined per- 
sonality measures and projective techniques. 


Received February 21, 1955. 


References 


1. Bechtoldt, H. P. Response defined anxiety and 
MMPI variables. Proc. Iowa Acad. Sci., 1953, 
60, 495-499. 

2. Brackbill, G., & Littl K. B. MMPI correlates 
of the Taylor Scale of Manifest Anxiety. J. 
consult. Psychol., 1954, 18, 433-436. 

3. Buss, A., Weiner, M., Durkee, Ann, & Baer, M 
The measurement of anxiety in clinical situa- 
tions. J. consult. Psychol., 1955, 19, 125-129. 





344 


4. 


Cox, F. N., & Sarason, S. B. Test anxiety and 
Rorschach performance. J. abnorm. soc. Psy- 
chol., 1954, 49, 371-377. 

Elizur, A. Content analysis of the Rorschach 
with regard to anxiety and hostility. J. proj. 
Tech., 1949, 13, 247-284. 

Eichler, R. M. Experimental stress and alleged 
Rorschach indices of anxiety. J. abnorm. soc. 
Psychol., 1951, 46, 344-355. 

Gallagher, J. J. Manifest anxiety changes con- 
comitant with client-centered therapy. J. con- 
sult. Psychol., 1953, 17, 443-446. 

Gleser, Goldine, & Ulett, G. The Saslow screen- 
ing test as a measure of anxiety proneness 
J. clin. Psychol., 1952, 8, 279-283. 

Goodstein, L. D. Interrelationships among sev- 
eral measures of anxiety and hostility. J. 
sult. Psychol., 1954, 18, 35-39. 

Holtzman, W. H., Calvin, A. D., & Bitt 
M. E. New evidence for the validity of Tay- 
lor’s Manifest Anxiety Scale. J. abnorm. so 
Psychol., 1952, 47, 853-854. 

Holtzman, W. H., Iscoe, L, & Calvin, A. D 
Rorschach color responses and manifest anx 
iety in women. 
1954, 18, 317-324. 

Hoyt, D. P., & Magoon, T. M. A validation 
study of the Taylor Manifest Anxiety Scale 
J. clin. Psychol., 1954, 10, 357-361 

Hull, C. Principles of behavior. New York: D. 
Appleton-Century, 1943 

Kendall, E. The validity of Taylor’s Manifest 
Anxiety Scale. J. consult. Psychol., 1954, 18, 
429-432. 

Klopfer, B., et al. Recent developments in the 
Rorschach technique. Vol. I. Yonkers, New 
York: World Book, 1954. 

Lauterbach, C. G. Empirical study of the Mani- 

fest Anxiety Scale and its 

other 


con- 


rman, 


college J. consult. P hol., 


relationships to 
anxiety. Unpub- 
lished doctor’s thesis, State Univer. of Iowa, 
1952. 


clinical measures of 


Leonard D. Goodstein 


and Leo Goldberger 


17. 


19 





. Mann, H. B., 


Mandler, G., & Sarason, S. B. 
iety and learning. J. 
1952, 47, 161-173. 

& Whitney, D. R. On a test of 

whether one of two random variables is sto- 


A study of anx- 
abnorm. soc. Psychol, 


chastically larger than the other. Ann. math. 

1947, 18, 50-60. 

H., & Bindra, D. “Manifest” anxiety, 
neurotic anxiety, and the rate of conditioning. 
J. abnorm. soc. Psyc hol., 1954, 49, 256-259 

Sarason, S. B., & Gordon, E. M. The test anx- 
iety questionnaire: scoring norms. J. 

Psychol., 1953, 48, 447-448. 
Sarason, S. B., & Mandler, G. Some correlates 


Psychol., 


Statist 
Sampson, 


abnorm. 


anxiety. J. abnorm. soc. 


1952, 47, 810-817. 


Spe K. W., & Taylor, Janet A. The relation 
of conditioned response strength to anxiety 
in norm neurotic and psychotic subjects 
J. exp. Psychol., 1953, 45, 265-272. 

Taffel, C. Conditioning of verbal behavior in an 
institutionalized population and its relation to 
‘anxi level.” Unpublished doctor’s thesis, 


Indiana Univer., 1952 
net A. A personality scale of manifest 
anxiet J. abnorm. soc. Psychol., 1953, 48, 
Taylor, Janet A., & Spence, K. W 
level in the behavior disorders. J 
Psychol., 1954, 49, 497-502. 
Welsh, G. S. An anxiety index and an internali 
zation ratio for the MMPI. J. consult. Psy- 
chol., 1952, 16, 65-72. 
Westrope, Martha R. Relations among Rorschach 
lices, manifest anxiety, and performance un- 
der stress. J. Psychol., 1953, 48, 


Conditioning 


abnorm 


me 


abnorm. so 


A scale of neuroticism: an adapta- 
tion of the Minnesota Multiphasic Personality 
J. clin. Psychol., 1951, 7, 117-122. 





' 
' 
; 




















Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


The Reliability and Validity of the Rotter 
Incomplete Sentences Test 


Ruth Churchill 
Antioch College 


and Vaughn J. Crandall 


Fel 


The Rotter Incomplete Sentences Blank 
(ISB) has been suggested for several uses. 
The authors of the ISB have stated that “the 
test might be given to incoming college fresh- 
men, or to special classes to determine which 
students are in need of personal help” (4, p. 
348), and “The test appears to be promising 
for use with college students for a variety of 
screening and experimental problems when a 
measure of degree of conflict or maladjust- 
ment is required” (4, p. 355). 

To the best of the present writers’ knowl- 
edge there have been no published studies of 
the reliability and validity of the ISB at col- 
leges other than Ohio State University where 
the test was developed. In addition, as reviews 
of the ISB in the Fourth Mental Measure- 
ments Yearbook by Cofer (2, pp. 243-244) 
and by Schofield (2, 244-245) have noted, 
the test-retest reliability of the ISB has yet 
to be evaluated. In the present study ISB 
data were gathered to answer these and other, 
questions. ; 

1. What is the ISB interscorer reliability 
among scorers with limited psychological ex- 
perience who are not trained by the authors 
of the test? The ISB manual reports inter- 
scorer reliabilities of .96 for female ISB rec- 
ords and .91 for male records (5, p. 7). The 
scorers had considerable psychological train- 
ing and were trained by the authors. Whether 
such high agreement would be found among 
scorers with less psychological experience and 
who had not been trained by the ISB authors 
remains an open question. Such a question is 
important since institutions using the ISB 


345 


Research Institute for the Study of Human Development 


must rely exclusively on the ISB manual for 
scorer training, and may not have the serv- 
ices of an experienced clinical 
as a scorer. 


psy¢ holog st 


2. How consistent is ISB behavior over 
varying periods of time? An evaluation of the 
test-retest reliability of the ISB is necessary 
If the test is to be used as a screening in- 


strument at college entrance to predict later 
difficulties, it must measure relatively stable 
personality characteristics rather than tempo- 
rary moods or reactive states. If 
to be used in research, the relationship of ISB 
performance to other variables may depend 


" . 
SI . + 
Lone est 1s 


on the reliability of the ISB, particularly 
when the ISB has been administered at a 


time different from that when the other vari- 
ables were measured. 

3. What are the effects of previously 
ing taken the ISB on retest performance? If 
ISB’s are to be used as a measure of change 
in adjustment, the effect of the original test 
performance on retest performance should be 
known. 

4. Are the normative data presented in the 
ISB manual applicable to students in colleges 
other than that from which the normative 
data were drawn? The manual presents nor- 
mative distributions of ISB scores of male 
and female college freshmen at Ohio State 
University. The degree to which these norms 
are applicable to students in other colleges 


1 
r 


' and universities is yet unknown. 


5. How well does the ISB identify ad- 
justed and maladjusted individuals? If col- 
leges are to use the ISB to discover students 





346 


likely to encounter difficulties in adjustment, 
it is well to discover whether or not students 
who earn high scores on the test at college 
entrance later manifest such difficulties. 


Method 
Subjects 


College students and adult women made 
up the two major samples of the present 
study. All of the college students in the study 
were attending Antioch College where ISB’s 
have been regularly administered to all in- 
coming students as part of their entering bat- 
tery of placement tests. Antioch College is a 
small coeducational liberal arts college with 
a program which alternates periods of aca- 
demic study with off-campus work. The stu- 
dent body comes from all parts of the United 
States with the largest number, 36 per cent, 
coming from the Middle Atlantic states. 

The adult women in the study were all 
members of a long-term longitudinal research 
project at the Fels Research Institute for the 
Study of Human Development. All of these 
women were married and were mothers of one 
or more children. Most of them were between 
35 and 45 years of age. Their educational 
backgrounds and intellectual levels were con- 
siderably higher than national averages. All 
of these Ss were middle class, with a pre- 
ponderance of them belonging to the lower- 
middle socioeconomic class. 


Scoring 


Three individuals scored the college stu- 
dent ISB’s. Two were senior students at 
Antioch College majoring in psychology. The 
third scorer had a B.A. degree in psychology. 
The ISB’s of the mother sample were scored 
by two persons both of whom had B.A. de- 
grees in psychology.’ Thus none of the scorers 
had had graduate training in psychology or 
any extensive psychological experience. These 
scorers were trained by one of the authors 
(VJC) using the directions and scoring ex- 
amples of the ISB manual (5). 

The college student ISB’s were scored in 


1The authors would like to express their grati- 
tude to Misses Ruth Kamrass and Barbara Stunden, 
and to Mrs. Piero Bellugi, who scored the college 
student ISB’s, and to Mrs. Toby Helfand and Mrs. 
Piero Bellugi, who scored the mother ISB’s. 





Ruth Churchill and Vaughn J. Crandall 


strict accordance with the ISB manual. Cer- 
tain modifications were necessary, however, 
in the scoring of the ISB’s of the mother 
sample. The adult form, rather than the col- 
lege form, of the ISB was given to the moth- 
ers. Since no manual exists for the adult form 
of the ISB, the mothers’ ISB’s were scored 
with the college-form manual. Whenever pos- 
sible, the examples of that manual were used. 
The scorers, however, were instructed to score 
in terms of the general instructions for each 
sentence whenever the specific examples for 
that sentence were judged to be inapplicable 
for adult females. 


Sampling 


For estimations of interscorer reliability, 
three scorers independently rated 40 ISB’s se- 
lected randomly from Antioch College fresh- 
men women students entering in 1953. Two 
scorers independently scored 45 ISB’s of the 
mother sample. 

College student samples were used to evalu- 
ate ISB test-retest reliability over varying in- 
tervals of time. All students originally had 
been tested at entrance to college. Six groups 
of students were retested. One group of men 
and one group of women were retested at six 
months. One group of men and one group of 
women were retested after a one-year interval. 
One group of men and one group of women 
students were retested three years after their 
original testing. Sixty Ss were randomly 
chosen for each of these six groups and were 
asked by mail to participate in a group re- 
test session. Students who were unable to at- 
tend this session (approximately one-third) 
were tested individually. About 5 per cent of 
the students whose participation was re- 
quested could not be contacted, and another 
8 per cent refused to participate. Forty-five 
retest ISB’s for each of the six groups were 
selected at random from those obtained in the 
retesting sessions. Each of the three scorers 
scored 15 Ss’ original test and retest ISB’s in 
each of the six groups. Original test ISB’s and 
retest ISB’s were mixed together and all 
identifying data removed. 

The procedure for evaluating ISB test-re- 
test reliability in the mother sample was simi- 
lar to that used in the college student sample 
except that, for the mothers, the retest in- 














terval varied somewhat from individual to in- 
dividual. For the 39 mothers used in this re- 
liability study, the retest interval ranged from 
13 to 24 months, with a mean of 17 months 
and a median of 20 months. All original and 
retest ISB’s were administered to the mothers 
in individual sessions. 

To evaluate the screening validity of the 
ISB for college students, a list was compiled 
of all college students requesting psychologi- 
cal counseling in the last four years who had 
entered counseling within two years of the 
time of their original ISB. (It seemed unrea- 
sonable to expect ISB scores to predict be- 
havior more than two years in the future.) 
There were 65 women and 24 men in this 
counseling group. The ISB scores of this 
group were compared with the scores of a 
noncounseling group consisting of all students 
in the test-retest study who had not under- 
gone psychological counseling (123 women 
and 132 men). 

In the mother sample, the validity of the 
ISB was evaluated by comparing the ISB 
scores of 44 mothers with a clinical psycholo- 
gist’s ratings of the personal-social adjustment 
of these mothers.* The psychologist, a Fels 
Institute Home Visitor, had routinely visited 
these mothers in their homes at least twice a 
year as part of the Fels longitudinal study of 
maternal behavior to observe these mothers in 
interaction with their children. In addition, 
the mothers had, at various times, been in- 
terviewed by the Home Visitor concerning 
their relationships, not only with their chil- 
dren, but also with their husbands, with their 
friends, and with the community at large. On 
the basis of this information, the Home Visi- 
tor rated the personal-social adjustment of 
the mothers, using a graphic rating scale spe- 
cifically constructed for the present study. 


Results 
Interscorer Reliability 


In a student sample (original ISB’s of 
40 randomly selected freshmen women), the 
product-moment correlations between scorers 
were: Scorers A and B, .94; Scorers B and C, 
.94; Scorers A and C, .95. Interscorer agree- 


2Mrs. Anne Preston was the Fels Home Visitor 
who rated the adjustment of the mothers. 


Reliability and Validity of Incomplete Sentences Test 





Table 1 


Means and Standard Deviations of Incomplete Sen 
tences Blank Scores for College Group, by Scorers 


Women Men 

Scorer Mean SD Mean SD 
First tests 

Scorer A 131.6 14.80 128.5 16.28 

Scorer B 127.9 17.63 129.3 13.67 

Scorer C 131.1 13.89 130.1 15.30 
Retests 

Scorer A 133.2 20.95 132.4 17.12 

Scorer B 128.8 24.08 133.5 15.03 

Scorer C 133.0 17.97 126.8 16.67 


ment on 45 ISB’s of the mother sample was 
.98. These reliabilities compare favorably 
with the correlation of .96 for college female 
ISB’s reported in the ISB manual. Table 1 
summarizes the data on the means and stand- 
ard deviations of the three scorers for the 
male and female college student ISB’s. Analy- 
ses of variance were performed on the means, 
and Bartlett’s test for homogeneity of vari- 
ance (1, pp. 141-144) was run on the vari- 
ances. Analyses of variance resulted in the 
following F’s: original test of female Ss, .75; 
retests of female Ss, .60; original tests of 
male Ss, .08; and retests of male Ss, 2.19. 
None of these F’s was significant. Likewise, 
none of the tests for homogeneity of variance 
resulted in significant differences. The fol- 
lowing Q/1 values were obtained: original 
test of female ISB’s, 2.74; female retests, 
3.70; original tests of male Ss, 1.35; and 
male retests, 0.80. Thus, in no group, male 
or female, original test or retest, was there a 
significant difference among the means or 
variances of the scores assigned by the three 
scorers. These results are promising. They 
suggest that high interscorer agreement can 
be found among ISB scorers with minimal 
psychological training who have been trained 
exclusively with the ISB manual. 


Test-retest Effects on Means and Variances 


Table 2 gives the test-retest data. The 
mean test-retest differences did not differ sig- 
nificantly from zero. An F of 1.35 was ob- 
tained which was not significant for 6 and 








348 Ruth Churchill and Vaughn J. Crandall 


Table 2 


Test-retest Reliability of the Incomplete Sentences 
Blank for College Students 


Original tests Retests 
Mean SD r 


Group Mean SD 








Women 
6 months 129.9 15.88 131.9 15.81 54 
retest 


Women 
1 year 127.4 14.43 130.7 21.46 50 
retest 


Women 
3 years 133.3 15.36 132.3 24.82 A4 
retest 


Men 
6 months 127.8 15.00 133.4 16.44 A3 
retest 


Men 
1 year 128.5 13.53 125.7 16.62 
retest 


wn 
to 


Men 
3 years 132.0 16.37 133.6 15.18 38 
retest 


264 degrees of freedom. The variances of 
these differences, however, were not equal 
from group to group. Bartlett’s test for ho- 
mogeneity of variance resulted in a Q/1 of 
11.02, significant at the .05 level for five de- 
grees of freedom. Inspection suggests that the 
greater variability of the variances of the re- 
test scores for women after one and three 
years accounts for this finding. If a com- 
bined hypothesis that the mean test-retest 
differences equal zero and that the variances 
of these test-retest differences also equal zero 
is tested by Fisher’s procedure for combining 
two independent probabilities (3, p. 104), the 
results are not significant. In general, then, it 
would appear that retest performance showed 
no consistent pattern of difference from origi- 
nal test performance. 


Test-retest Reliability 


Table 2 also summarizes the test-retest 
reliability data. The reliability coefficients, 
ranging from .38 to .54, are not very satis- 
factory. These coefficients indicate that, if 


the ISB is to be used for experimental prob- 
lems, as suggested by the test authors, there 
should be no great time lapse between ISB 
administration and the measurement of ex- 
perimental variables to be correlated with 
ISB performance. However, all correlations 
were significantly different from zero beyond 
the .01 level of confidence, indicating that 
the ISB measures more than momentary 
moods or reactive states. 

The test-retest reliability of ISB perform- 
ance in the mother group was also evalu- 
ated. Their original test mean was 132.6 
with a standard deviation of 18.01. Their 
retest mean and standard deviation were 
135.5 and 20.48, respectively. The mothers 
had a median interval of twenty months be- 
tween testing. The correlation of their test- 
retest scores was .70. This figure is consider- 
ably higher than any of the test-retest cor- 
relations of the college women. While no exact 
reason can be given for the greater test-retest 
reliability of the mothers’ ISB’s, there are at 
least two likely explanations. First, the test 
and retest situations for the mothers were 
exactly the same. On the other hand, the col- 
lege students wrote their original ISB’s as 
part of a required series of placement tests 
given when they arrived on campus, while 
their participation in the retest administra- 
tion was voluntary. A second possibility may 
have been the difference in the environments 
of the two samples. While it can be assumed 
that most of the mothers’ lives were rela- 
tively stable, the college students were mov- 
ing from home environments to college and 
job environments entailing many new experi- 
ences and adjustments. 


Normative Data 


The means and standard deviations of the 
original tests in Table 2 also serve to confirm 
the normative means and standard deviations 
reported by the test authors in their manual. 
They reported a mean of 127.4 and a stand- 
ard deviation of 14.4 for 85 freshmen women 
and a mean of 127.5 and a standard devia- 
tion of 14.2 for 214 freshmen men at Ohio 
State University (5, p. 11). In the present 
study, an analysis of variance was run on 
four samples of women (Rotter’s sample and 
the three Antioch College original test sam- 





— os we mm Ae * ££.) DTD eet OO 


~ 


— moh AS 








ples). The analysis yielded an F value of 
1.13, which was not significant. Bartlett’s 
test for homogeneity of variance for the four 
samples yielded a nonsignificant Q/1 value of 
3.32. When an analysis of variance was run 
on Rotter’s male sample and the three Antioch 
College samples of men, a nonsignificant F of 
0.55 was obtained. Bartlett’s test for homo- 
geneity of variance on these samples yielded 
a nonsignificant Q/1 of 2.00. The compara- 
bility of these ISB data from students of two 
different colleges (a large state university and 
a small liberal arts college with a work-study 
program) suggests that the norms presented 
in the ISB manual may be found to be ap- 
plicable in a variety of college settings. 


Validity Data 


Table 3 indicates that ISB’s given at col- 
lege entrance did differentiate students who 
entered psychological counseling within two 
years from those who did not. The biserial 
correlations between ISB scores and presence 
in, or absence from, the counseling group 
were moderate for both men and women. They 
were lower than the biserial correlations of 
.50 for women and .62 for men reported by 
Rotter and Rafferty in the ISB manual (5, 
pp. 8-9). However, different criteria were 
used in the two studies. Rotter and Rafferty 
used direct ratings of adjustment while the 
present study used a criterion of entering 
or not entering psychological counseling. The 
mother sample data of the present study ap- 
pear relevant here. The mothers’ adjustment 
was rated by a clinical psychologist, a cri- 
terion more directly comparable to that used 


Table 3 


Incomplete Sentences Blank Scores of Students Seeking 
and Not Seeking Psychological Counseling 





Women Men 
Coun- Noncoun- Coun- Noncoun- 
Statistic seling seling seling seling 
N 65 123 24 132 
Mean 141.8 129.6 140.1 129.3 
SD 19.5 15.6 17.9 15.2 
T bis 42 37 
t 4.61 3.03 








Reliability and Validity of Incomplete Sentences Test 





349 


by Rotter and Rafferty. The product-moment 
correlation of .49 between the psychologist’s 
ratings and the ISB scores of the 44 mothers 
compares favorably with Rotter’s biserial cor- 
relation of .50 for female students. 

While the data in Table 3 indicate that col- 
lege students who sought psychological coun- 
seling were differentiated from those who did 
not, they do not indicate how accurately a 
student would be assigned to the counseling 
or noncounseling groups on the basis of his 
ISB score. When the cutting score of 135 
suggested in the ISB manual (5, p. 9) was 
applied to the ISB’s summarized in Table 3, 
that score correctly classified 64% of the 
noncounseling women and 63% of the non- 
counseling men, and correctly identified 66% 
of the women and 54% 
chological counseling. 


of the men in psy- 


Summary 


In the present study, the reliability and 
validity of the Rotter Incomplete Sentences 
Blank were investigated in samples of col- 
lege students from a small liberal arts col- 
lege and in a sample of middle-class mothers. 

The following results were obtained: (ca) 
High interscorer agreement was found among 
scorers who had relatively little psychological 
training (a B.A. in psychology or less) and 
who were trained exclusively on the ISB 
manual. In 
were above .90. 
samples of 
variances from scorer to si 


L134 


~ ; 
all cases interscorer reliabilities 


Scores assigned to the various 
Ss had comparable means and 
5) Moderate 
test-retest reliability behavior was found for 
periods up to three years, suggesting that the 
ISB measures more than temporary moods. 
The test-retest reliability of the mother’s 
ISB’s was somewhat higher than that found 
in the college students. This difference may 
have been due to the greater stability of the 
mothers’ environment or to the fact that the 
test-retest situations were more similar for 
the mothers than for the students. (c) No 
consistent changes in means and variances 
were found upon retest. (d) The normative 
data obtained in the present study did not 
differ significantly from the normative data 
presented by the ISB authors. This finding 
suggests that the authors’ ISB norms may be 


orer. { 





350 Ruth Churchill and 


found to be applicable in a variety of college 
settings. (e€) When entering psychological 
counseling versus not entering psychological 
counseling was used as a criterion of college 
student adjustment, the ISB was found to 
have moderate screening validity, somewhat 
lower than that reported by the authors of 
the test. That this lower validity may have 
been due to the criterion used in the present 
study is suggested by the fact that, for the 
mother sample, where the criterion of adjust- 
ment was similar to that employed by Rotter 
and Rafferty, their validity figures were 
duplicated. 


Received January 25, 1955. 


Vaughn J. Crandall 


References 


1. Anderson, R. L., & Bancroft, T. A. Statistical 
theory in research. New York: McGraw-Hill, 
1952. 

2. Buros, O. K. (Ed.) The fourth mental measure- 
ments yearbook. Highland Park, N. J.: 
Gryphon Press, 1953. 

3. Fisher, R. A. Statistical methods for research 
workers. Edinburgh: Oliver & Boyde, 1936. 

4. Rotter, J. B., Rafferty, Janet, & Schachtitz, Eva. 
Validation of the Rotter Incomplete Sentences 
Blank for college screening. J. consult. Psy- 
chol., 1949, 13, 348-356. 


5. Rotter, J. B., & Rafferty, Janet. Manual for the 
Rotter Incomplete Sentences Blank, College 
Form. New York: Psychological Corp., 1950. 





- 





Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Predictive Behavior and Personal Adjustment’ 


James Bieri, Edward Blacharsky, and J. William Reid 


Harvard University 


As part of the concern with the prediction 
of human behavior, increasing attention has 
recently been focused on the process of pre- 
diction. Predictive behavior is involved in the 
studies of social perception, empathy, and 
judging others (3), and represents an indi- 
vidual’s attempts to perceive, understand, and 
anticipate his own and other people’s actions 
in the social environment. An important as- 
pect of this problem for personality theory, 
the relationship between personal adjustment 
and predictive behavior, has received minimal 
attention. While the consensus of findings in 
this area points toward a positive relationship 
between adjustment and predictive accuracy 
in judging others, the evidence is not sub- 
stantial and contradictory findings have been 
reported (12). The present study is designed 
to investigate further the role general adjust- 
ment may play in predictive behavior. 

Two ostensibly opposing viewpoints are 
commonly expressed in regard to the influ- 
ence of adjustment on the accuracy of pre- 
dictive behavior. On the one hand, it is felt 
that the presence of personal problems and 
conflicts clouds or distorts the view of the 
perceiver and renders his understanding of 
others biased and inaccurate. On the other 
hand, it may be that some degree of malad- 
justment is necessary for an understanding of 
human behavior, especially in its deviant 
forms—no one understands the schizophrenic 
like a schizophrenic. While these two view- 
points suggest that the degree of maladjust- 
ment may be a crucial variable, the weight of 
clinical experience and theoretical rationale 
would appear to support the first alternative. 
While the neurotic may be sensitized to cer- 


1 This study was made possible by a grant from 
the Laboratory of Social Relations, Harvard Uni- 
versity. 


tain aspects of others’ behavior, large seg- 
ments of experience seem selectively inat- 
tended, “forgotten,” or overlooked. Such as- 
sumptions are reflected in psychoanalytic 
theory which stresses the restrictive effect of 
conflict and repression in neurotic behavior. 
Similarly, Sullivan discusses parataxic distor- 
tions as the hallmark of neurotic adjustment. 

For purposes of this study, we incline to- 
ward a definition of maladjustment in keep- 
ing with these views, i.e., one which stresses 
the conflictual nature of maladjusted behav- 
ior. The measure of maladjustment employed 
in this research is essentially a measure of the 
conflictual behavior expressed by the indi- 
vidual. It is assumed that the greater the de- 
gree of personal conflict, the more will the 
individual’s perceptions of others be subject 
to some amount of bias, inaccuracy, and dis- 
tortion. This forms the basis for Hypothesis I: 


There will be a negative relationship be- 
tween the degree of maladjustment and the 
accuracy of predictive behavior. 


If we examine this proposition further, one 
might inquire as to the nature of the pre- 
dicted inaccuracy on the part of maladjusted 
subjects. One of the behaviors of distortion 
available to the conflicted individual is that 
of projection, that is, perceiving others as 
similar to oneself. While certainly other types 
of defenses may be operant, a measurement of 
a type of projection is facilitated in tests of 
predictive behavior by comparing the judge’s 
predictions with his own self responses. We 
call this behavior assimilative projection, 
after Cameron (4). Presumably, assimilative 
projection is a function of inadequate reality 
testing and difficulty in differentiating oneself 
from the external world, both associated with 
maladjustment. Hypothesis IT states: 


351 





352 James Bieri, Edward Blacharsky, and J. William Reid 


There will be a significant positive relation- 
ship between degree of maladjustment and 
the tendency to engage in assimilative projec- 
tion in predictive behavior. 


Method 


Subjects. Subjects (Ss) in the study were 
40 university undergraduate concentrators in 
Social Relations. The 33 males and 7 females 
in the sample were juniors and sophomores 
who met together in small tutorial groups 
once weekly. 

Adjustment measure. The criterion for de- 
gree of maladjustment was the S’s score on 
the College Form of the Rotter Incomplete 
Sentences Blank (ISB) (9). The ISB con- 
sists of 40 beginning stems of sentences which 
S is asked to complete. Examples of these 
stems are “I like ... ,” “My mind... ,” 
and “The future. . . .” Each sentence can be 
scored according to the degree of conflict ex- 
pressed in the response, varying from a score 
of zero (least conflict) to a score of six (most 
conflict) (10). With two scorers, interscorer 
reliability for the present study was .95. The 
range of scores was 94 to 182, with a mean of 
135.1 and a standard deviation of 19.6. This 
distribution of scores closely approximates a 
normal curve. 

Comparison of these scores with those in 
Rotter’s normative group indicates the pres- 
ent sample obtained a higher mean ISB score, 
as well as being more variable in their scores. 
It should be noted that Rotter recommends a 
cutting score of 135 to differentiate adjusted 
from maladjusted Ss. Inasmuch as this value 
closely corresponds to the median of our sam- 
ple, such a cutting score divides our group ap- 
proximately in half. 

Predictive test. Several criteria were consid- 
ered in the selection of a predictive test. In- 
asmuch as Ss’ self responses were to be used 
as criteria for accuracy, a test of known reli- 
ability was desirable. In order to facilitate 
a measurement of assimilative projection, a 
double alternative response (yes-no) was 
necessary. Finally, in order to cover a range 
of behaviors, the test should contain items 
pertaining to observable behavioral charac- 
teristics as well as more internalized, less ob- 
servable psychological states. To meet these 
criteria, the 50-item Manifest Anxiety Scale 


(MAS) (13) was selected as the predictive 
test. Inspection of this scale indicates ap- 
proximately half of the items are concerned 
with outwardly expressed behavior, and half 
with more covert behavior. 

Procedure. Each S took part in the experi- 
ment after his tutorial group had met three 
times. Thus, although Ss had a working ac- 
quaintance with one another, they were still 
in the early phases of mutual acquaintance- 
ship. In an individual session, S first took the 
MAS. The S was then asked to rank the other 
5 members of his tutorial group (who were 
also Ss in the experiment) according to how 
well he felt he knew them. The persons re- 
ceiving, intermediate ranks three and four 
were selected as the others (Os) for S to pre- 
dict. In this way, an attempt was made to 
control the variable of familiarity, which 
could conceivably affect prediction scores. 
The S was asked to fill out the MAS in the 
way in which he thought persons three and 
four would fill out theirs. Each S, therefore, 
predicted two Os on the MAS. Following 
this, S was asked to complete the ISB. 

Scores. The scores of predictive behavior 
were derived in a manner identical to that 
reported in an earlier study (2). Predictive 
accuracy scores were obtained by summing 
the correct predictions made by S on both 
Os, the criterion for accuracy being the agree- 
ment of S’s predictions with the responses 
given by O. The assimilative projection score 
is the number of predictions made by S which 
correspond to his own self responses. These 
two scores were used in testing Hypotheses I 
and II. Both the predictive accuracy and as- 
similative projection scores share a common 
component score which represents the sum of 
the accurate predictions made by S which 
were identical to S’s own responses. This 
component score we call accurate projec- 
tion. Two additional component scores were 
studied. One of these, inaccurate projection, 
consists of those predictions which are the 
same as S’s own responses but different from 
O’s responses. The other component score, ac- 
curate perceived differences, consists of those 
predictions which are accurate and which are 
different from the self responses of S. These 
three component scores were utilized in as- 




















ence 








Predictive Behavior and Personal Adjustment 353 


Table 1 
Relationships Between ISB and MAS Scores and 
Predictive Accuracy Measures 














(N = 40) 

Measure ISB* pt MAS* PT 
Predictive Accuracy 19 .23 —.12 A5 
Assimilative Projection —.38 .02 —.71 >.001 
Accurate Projection —.23 .15 —64 >.001 
Inaccurate Projection —.24 .13 —.18 .26 
Accurate Perceived 

Differences 49 002 69 >.001 
* High scores reflect greater maladjustment 


Tt Two-tailed test. 


sessing in more detail the predictive behavior 
of the Ss. ‘ 


Results 


Table 1 contains the product-moment cor- 
relations between predictive behavior scores 
and ISB scores used to test the experimental 
hypotheses. Although they were not used to 
test the experimental hypotheses, correlations 
between predictive behavior scores and MAS 
scores are also reported in Table 1 for pur- 
poses of comparison with the relationships 
obtained with the ISB. Hypothesis I states 
there will be a negative relationship between 
predictive accuracy and maladjustment. No 
support for the hypothesis is found; in fact, 
the obtained nonsignificant correlation (r = 
.19) is in the direction opposite to that pre- 
dicted. Hypothesis II predicted a positive 
relationship between maladjustment and 
assimilative projection. Not only is this hy- 
pothesis not supported, but the relationship 
obtained (r = — .38) is significant at the .02 
level in the direction opposite to the predic- 
tion. That is, higher assimilative projection 
scores tend to be associated with scores of 
better adjustment. These findings become 
clearer when we analyze the relationships be- 
tween the three component scores and ISB 
scores. Both accurate projection and inac- 
curate projection correlate negatively and 
about equally with maladjustment (r = — 
.23 and — .24, respectively). Thus, assimila- 
tive projection tendency on the part of the 
better adjusted Ss does not favor either of the 
accurate or inaccurate projection components. 

The predictive accuracy score is composed 


of the accurate projection measure and the ac- 
curate perceived differences measure. The for- 
mer score as we have seen tends to correlate 
negatively with ISB scores. If we now con- 
sider the other component of predictive ac- 
curacy, Table 1 shows there is a rather strik- 
ing positive relationship between accurate per- 
ceived differences and ISB scores (r = .49). 
That is, high maladjustment scores on the 
ISB tend to be associated with accurate pre- 
diction of differences. This relationship can 
be analyzed further by categorizing the ac- 
curate perceived difference scores into four 
columns based upon quartiles of ISB scores. 
An analysis of variance yields an F value of 
5.02 (p< .01). The lowest quartile Ss on 
the ISB (most adjusted) had a mean accu- 
rate perceived difference score of 9.5, while 
the fourth quartile Ss had a mean accurate 
perceived difference score of 20.8. 

Thus we find that the better adjusted Ss 
tend to predict most accurately on the basis 
of similarities between themselves and others 
while maladjusted Ss emphasize to a greater 
degree than adjusted Ss accurate prediction 
of differences between themselves and others. 
This is shown most clearly if the ratio of ac- 
curate projection to accurate perceived dii- 
ferences for each S is correlated with his ISB 
score. The obtained correlation is — .46 (p 
< .01). That is, Ss with high (maladjusted) 
ISB scores tend to have higher accurate per- 
ceived difference scores relative to their ac- 
curate projection scores than do Ss with low 
(adjusted) ISB scores. 

Although they did not enter into the test- 
ing of hypotheses in this study, the relation- 
ships between MAS scores and the predictive 
accuracy scores are informative (Table 1). 
The fact that all but one of these correla- 
tions are in the same direction as those ob- 
tained with ISB scores may be expected since 
both tests attempt to measure behavior re- 
lated to the realm of personal maladjustment. 
The correlation between ISB scores and MAS 
scores is .46 (p < .01). It should be noted 
that the MAS scores obtained on our sample 
compare closely to the scores obtained by 
Taylor (13). We obtained a mean of 15.4, 
standard deviation of 7.6, and a range from 
5 to 40. 





354 James Bieri, Edward Blacharsky, and J. William Reid 


With the exception of the predictive ac- 
curacy and the inaccurate projection meas- 
ures, MAS scores correlated more highly with 
prediction scores than did ISB scores. Thus, 
Ss who are low on the MAS perceived others 
as like themselves (r = — .71); however, 
they tend to be accurate in these projections 
(r = — .64). The Ss who are high on the 
MAS tend to accurately perceive others as 
different from themselves (r = .69).? 


Discussion 


Within the realm of behavior sampled in 
this study, the present results suggest the fol- 
lowing hypotheses for further investigation. 
First, there will not be a significant relation- 
ship between adjustment and gross predic- 
tive accuracy. Second, the tendency to pre- 
dict others’ self ratings as similar to one’s 
own will be positively correlated with adjust- 
ment. Third, the tendency to predict accurate 
differences between oneself and others will be 
positively correlated with maladjustment. Let 
us examine each of these propositions in the 
light of this study. 

That predictive accuracy is not reliably 
associated with adjustment, as originally hy- 
pothesized, may reflect the relative homo- 
geneity of our sample in terms of the adjust- 
ment variable. For example, if we had studied 
more grossly disturbed Ss than those of our 
sample, Hypothesis I might have been sup- 
ported. On the other hand, the ISB and MAS 
scores indicate, and our clinical impressions 
concur, that the sample did include a wide 
range of students relative to the adjustment 
variable. Thus, although lack of extremes, 
particularly at the maladjusted end of the 
distribution, may have worked against our 
predictive accuracy hypothesis, we feel this 


2A brief comment about the relationship between 
the ISB and MAS can be made. Upon inspection, it 
was found that conflict scores on two sentences of 
the ISB correlated as high or higher with MAS 
scores than did the total ISB scores. These sentences 
and their correlations with MAS were “I suffer . . .” 
(r= .49) and “My nerves...” (.46). When the 
scores for these two sentences are combined, the 
correlation with MAS scores becomes .61. While 
these may be chance relationships, they suggest that 
a few such stems could approximate a longer ques- 
tionnaire in measuring the degree of anxiety an S will 
report about himself. 


factor alone does not explain our results. 
Other studies have reported similar negligible 
relationships between predictive accuracy and 
adjustment (5, 6, 7). We are particularly im- 
pressed by the differential relationships the 
assimilative projection and accurate perceived 
differences measures have with maladjust- 
ment. In effect, it is suggested that within the 
realm of personal adjustment we have sam- 
pled, the relative adjustment of an S$ influ- 
ences his predictive behavior mot in terms of 
how accurate or inaccurate it is, but rather in 
terms of the mode of predictive behavior en- 
gaged in. In terms of the ideas underlying 
our original hypotheses, the well-adjusted S 
projects not as a function of conflict, but 
rather because he has less need to discrimi- 
nate differences between self and others. As- 
suming others are like oneself is adequate in 
that although the assumption may be incor- 
rect, the projection itself does not represent 
a gross distortion of reality because the pro- 
jector is relatively well adjusted. The less 
well-adjusted person, however, may become 
attuned to differences as he realistically per- 
ceives differences between himself and others, 
and as he seeks to understand and differenti- 
ate these differences. Therefore, both types of 
behavior reflect an adequate coping with pre- 
dicting and perceiving others when the indi- 
vidual’s status relative to adjustment is con- 
sidered. 

Other alternative explanations must be ex- 
plored to determine if these relationships rep- 
resent subject variation alone or whether they 
are influenced by the characteristics of the 
experimental! situation. If we score Ss’ pre- 
dictions of Os for manifest anxiety, we find a 
slight insignificant tendency for high Ss on 
the ISB to predict others as high and for low 
Ss to predict others as low (r = .10). Thus, 
the mean predicted MAS score for high ISB 
(upper half) Ss is 14.3 while that of the low 
Ss (lower half) on the MAS is 13.3. This 
tendency is coupled with the fact that there 
is a negative insignificant relationship be- 
tween the actual MAS scores of the Os being 
predicted and the ISB scores of the Ss who 
predicted them (r = — .23). Since Ss with 
high ISB scores tended to predict Os as 
being high who were in actuality somewhat 
lower, we would expect them to have high 











— i, oa. 86a 











Predictive Behavior and Personal Adjustment 


inaccurate projection scores. Since Ss with 
lower ISB scores tended to predict Os as less 
anxious who in actuality were somewhat more 
anxious, we would also expect them to have 
high inaccurate projection scores. In spite of 
these trends, the results (Table 1) indicate 
that scores on the ISB did not correlate ap- 
preciably higher with inaccurate projection 
(r = — .24) than with accurate projection 
(r = — .23). These effects may have tended 
to inflate the inaccurate projection scores, 
however. 

It is considered that several factors in this 
study did facilitate the expression of assimila- 
tive projection and perceived differences on 
the part of the Ss. As we have noted, Ss were 
relatively recently acquainted, and with a lack 
of intensive knowledge about the Os, the free 
play of projection may be expected (1). In 
addition, Ss were faced with an either-or pre- 
dictive situation. That is, owing to the true 
or false response alternatives on the MAS, S 
either had to express a similarity or a dis- 
similarity with himself when he made predic- 
tions. It should be pointed out, however, that 
other investigators have observed similar be- 
haviors utilizing different instruments. Sears’ 
concept of “contrast formation” appears allied 
to our notion of accurate perceived differ- 
ences, that is “. . . a dynamic process which 
operates in the opposite direction from pro- 
jection” (11, p. 161). Goldings (8) reports 
similar tendencies for Ss to engage in either 
“supplementary” projection or “contrast” pro- 
jection. Thus, evidence from other studies also 
suggests that predictive behavior may entail 
a two-directional process. Either the S may 
invoke assimilative or supplementary projec- 
tion behavior in his predictions or he may 
emphasize contrast or complementary pro- 
jections (perceived differences). The present 
study suggests as an hypothesis for future 
test that such differences are partially a func- 
tion of the personal adjustment of the indi- 
vidual. 


Summary 


Two hypotheses were tested relative to the 
relationship between personal adjustment and 
predictive behavior. It was predicted that de- 
gree of maladjustment will correlate nega- 
tively with predictive accuracy. Further, it 


355 


was predicted that a positive relationship will 
exist between degree of maladjustment and 
the use of assimilative projection in one’s pre- 
dictive behavior. The Rotter Incomplete Sen- 
tences Blank was administered to 40 under- 
graduate Ss and was used as the criterion of 
adjustment. The Manifest Anxiety Scale was 
used as the predicting test. 

The results provide no support for either 
hypothesis. An insignificant positive correla- 
tion existed between adjustment and predic- 
tive accuracy scores. When the components 
of the predictive accuracy score were ana- 
lyzed, a significant (p = .002) positive rela- 
tionship was found between maladjustment 
and the ability to predict differences correctly. 
A significant negative correlation (p = .02) 
existed between assimilative projection and 
maladjustment. However, neither component 
of the assimilative projection measure, i.e., 
accurate projection and inaccurate projection, 
correlates significantly with the adjustment 
criterion. If the MAS is used as the criterion 
variable, more pronounced relationships are 
obtained than with the ISB. 

Analysis of the data indicates that the 
tendency for adjusted Ss to predict similari- 
ties (project) and for maladjusted Ss to ac- 
curately predict differences reflects behavioral 
tendencies of the Ss and is not an artifact of 
the test situation. The similarity of these find- 
ings to previous research is briefly discussed. 
These findings are discussed as suggestions 
for hypotheses to be tested relative to the in- 
fluence of adjustment on predictive behavior. 


Received March 7, 1955. 


References 


1. Bieri, J. Changes in interpersonal perceptions fol- 
lowing social interaction. J. abnorm. soc. Psy- 
chol., 1953, 48, 61-66. 

2. Bieri, J. Cognitive complexity-simplicity and 
predictive behavior. J. abnorm. soc. Psychol., 
1955, 51, 263-268. 

3. Bruner, J. S., & Tagiuri, R. The perception of 
people. In G. Lindzey (Ed.), Handbook of 
social psychology. Cambridge, Mass.: Addison- 
Wesley, 1954. Pp. 634-654. 

4. Cameron, N. The psychology of behavior dis- 
orders. Boston: Houghton Mifflin, 1947. 

5. Davids, A. Alienation, social apperception, and 
ego structure. J. consult. Psychol., 1955, 19, 
21-27. 








356 


6. Estes, S. G. Judging personality from expressive 
behavior. J. abnorm. soc. Psychol., 1938, 33, 
217-236. 

7. Gage, N. L. Judging interests from expressive 
behavior. Psychol. Monogr., 1952, 66, No. 18 
(Whole No. 350). 

8. Goldings, H. S. On the avowal and projection of 
happiness. J. Pers., 1954, 23, 30-37. 

9. Rotter, J. B., Rafferty, Janet E., & Schachtitz, 
Eva. Validation of the Rotter Incomplete Sen- 
tences Blank for college screening. J. consult. 
Psychol., 1949, 13, 348-356. 


James Bieri, Edward Blacharsky, and J. William Reid 


10. Rotter, J. B., & Rafferty, Janet E. Manual for 
the Rotter Incomplete Sentences Blank, Col- 
lege Form. New York: Psychological Corp., 
1950. 

1. Sears, R. R. Experimental studies of projection: 
I. Attribution of traits. J. soc. Psychol., 1936, 
7, 151-163. 

2. Taft, R. The ability to judge people. Psychol. 
Bull., 1955, 52, 1-23. 

13. Taylor, Janet A. A personality scale of manifest 
anxiety. J. abnorm. soc. Psychol., 1953, 48, 
285-290. 


— 


— 




















Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Success in Psychotherapy as a Function of Certain 
Actuarial Variables’ 


Desmond S. Cartwright 


University of Chicago 


In a recent publication, Seeman (9) has 
reported that age is unrelated to rated suc- 
cess in client-centered psychotherapy, that 
women tend to be rated more successful than 
men (~ < .05), and that there is a trend in 
favor of longer cases: “. . . shorter cases 
spanned the entire range of success ratings 
from 1 to 9, while the longer cases were 
judged to fall on two high points of the scale 
(points 7 and 8)” (9, p. 105). 

Differences between the sexes and between 
case lengths, if confirmed, would clearly have 
important implications for both the theory 
and the practice of psychotherapy. In See- 
man’s study, all therapists were males. If it 
were shown that clients tend to be more suc- 
cussful with a therapist of the opposite sex, 
such a finding would suggest that successful 
therapy depends to some extent upon the es- 
tablishment of a satisfactory heterosexual re- 
lationship. It would also suggest that, where 
there is the possibility of choice, female cli- 
ents should see only male counselors, and 
perhaps also that male clients should see only 
female counselors. 

The question, “how long does therapy 
take?,” is an important one for therapists no 
less than for clients. It is also important sci- 
entifically. There may be individual differ- 
ences among clients. Indeed, the difference 
between a client with mainly situational prob- 
lems and one with problems of personal ad- 
justment has long been thought to have a 
bearing on the length of therapy. 


1 This investigation was supported by a research 
grant (PHS M 903) from the National Institute of 
Mental Health, of the National Institutes of Health, 
Public Health Service. 


Purpose of the Present Study 


Since Seeman’s report covered the results 
for only 23 clients, it was the purpose of the 
present study to test his findings with a 
larger sample. In addition to the variables he 
used, the variable of student vs. nonstudent 
status was included in this investigation. This 
variable was included not only because of the 
individual differences that may be associated 
with it, but also because of the difference in 
fees charged to students and to nonstudents 
for therapeutic interviews. 


Procedure 


The 78 clients whose ratings of success 
were examined in this investigation were all 
seen by client-centered therapists at the 
Counseling Center, University of Chicago, 
during the period 1949 to 1954. Data were 
gathered on the largest possible number of 
cases in the Counseling Center research files. 
Thus all the clients had agreed to take part 
in research of one kind or another prior to 
commencing therapy. Only those cases were 
taken from the files which had data on all 
five variables required by this investigation. 
These data are summarized in Table 1. 

The 78 clients were all seen by 17 male 
therapists, all of whom had at least 341 in- 
terview hours of experience prior to working 
with a research client. 

Seeman’s report was concerned with clients 
who had completed at least six interviews. It 
was the first purpose of the present study to 
examine his findings in regard to sex and 
number of interviews, so that the first part of 
the investigation was concerned only with cli- 
ents who had completed at least six inter- 


357 





358 





Desmond S. Cartwright 


Table 1 


Means and Standard Deviations on Age, Number of Interviews, and Success Rating for Total N = 78, 
for Males vs. Females, and for Students vs. Nonstudents 


























Age Interviews Success 
Group N Mean SD Mean SD Mean SD 
Total 78 26.47 4.78 26.55 30.16 4.95 2.24 
Males 45 26.76 3.75 33.22 35.57 4.98 2.13 
Females 33 26.09 5.87 17.46 16.73 4.91 2.38 
Students 48 25.46 4.29 32.10 35.51 5.33 2.15 
Nonstudents 30 28.10 5.06 17.67 14.77 4.33 2.24 





views. For this purpose only 62 cases were 
available. The subsequent examination in- 
cluded clients who had had at least one in- 
terview, for which purpose the 78 clients as 
described in Table 1 were available. 

Since counselor ratings of success were the 
criterion variable in this investigation, some 
estimates of their reliability and validity were 
sought. 

Reliability. Seeman (9) reported a mean 
reliability of .81 for counselor judgments. 
This figure was obtained, however, by asking 
seven counselors to rerate their clients on 
each of 10 nine-point rating scales, including 
the rating for success. In the absence of reli- 
ability data pertaining strictly and only to 
counselor ratings of success, the present writer 
asked eight counselors to rerate 15 clients 
whom they had previously rated for research 
projects at the Counseling Center, University 
of Chicago. The mean length of time between 
first and second ratings was 14.2 months, a 
span over which considerable forgetting might 
have been expected. The rate-rerate reliabil- 
ity was r = .86. This result, with that of See- 
man, may be regarded as evidence that the 
nine-point counselor rating scale of success in 
therapy is a reliable instrument. 

Validity. The validity of counselor ratings 
of success may be estimated from the reported 
correlations between such ratings and other 
measures of the process and outcomes of 
therapy. Raskin (7) reported a rho correla- 
tion of .70 (p < .01),? between counselor rat- 


2 The p value was computed by the present writer 
on the basis of the probable error reported by Ras- 
kin, PE = .11. 





ings and the mean degree of improvement ac- 
cording to five methodologically independent 
protocol-analysis measures for 10 cases. Var- 
gas (10) reported rho correlations between 
counselor ratings and 6 independent process 
measures of improvement on 10 cases; his 
rho’s range from .64 to .99, two being sig- 
nificant at the .05 level, and four significant 
at better than the .0i level. 

The correlations reported between coun- 
selor ratings and measures of outcome are 
somewhat more variable than those between 
ratings and process measures. Carr (1) found 
no significant correlation between counselor 
ratings and extent of change as measured by 
the Rorschach test. While other investigators 
have found varying degrees of significance in 
the relation between ratings and Rorschach 
indicators (5, 6), it seems that Rorschach 
measures cannot be said to validate counselor 
ratings. Dymond (3) reported a 2 X 2 table 
showing the numbers of low success clients 
(rated 1-6), and high success clients (rated 
7-9) who improved, and of those who did not 
improve as measured by blind ratings of 
Thematic Apperception Test protocols on a 
seven-point scale of adjustment. With NV = 
22, the phi coefficient computed by the pres- 
ent writer on her data is .47, with p < .05, 
as reported by Dymond for the chi-square 
test. Rogers (8) reported the correlation be- 
tween counselor ratings and changes from be- 
fore to after therapy based on evaluations of 
the client by observers using the Willoughby 
Emotional Maturity Scale. The sample con- 
sisted of 32 clients, and the correlation was 
Al (p < 05). 

















Therapy and Certain Actuarial Variables 359 


So far the measures reported have all been 
measures of change during therapy in rela- 
tion to counselor ratings. Findings in regard 
to the correlation between measures of ad- 
justment after therapy and the counselor rat- 
ings of outcome would also seem important, 
since it is likely that not only extent of 
change but also absolute level of adjustment 
at the completion of therapy may be factors 
entering into the counselor’s judgment. In the 
study mentioned before, Dymond reported 
that the correlation between counselor ratings 
and the post-therapy TAT ratings, for VN = 
25, was 40 (p< .05). Gordon and Cart- 
wright (4) reported a rho correlation of .60 
(p < .01) between counselor ratings and Q- 
adjustment scores (2) after therapy. 

In summary, it would seem that for differ- 
ent samples, using different measures, varying 
degrees of correlation between counselor rat- 
ings and objective measures of the process 
and outcomes of therapy have been obtained. 
Such variation is to be expected. With the ex- 
ception of the equivocal findings in relation 
to the Rorschach test, however, all studies 
showed a degree of correspondence between 
the ratings and the objective measures which 
is significant at better than the .05 level. In 
view of the fact that no corrections for at- 
tenuation were made on these correlations, it 
seems reasonable to conclude that the coun- 
selor nine-point rating scale of success in 
therapy is a valid instrument. 


Results 


Re-examination of Seeman’s findings. In 
Table 2 Means and SD’s of counselor’s rat- 
ings are shown for 78 clients divided into 10 
groups according to number of interviews. 
The data on 62 clients who had had six in- 
terviews or more were examined in regard to 
the interrelations between the variables of 
sex, number of interviews, and success rating. 
Since these results were essentially duplicated 
by those based upon the subsequent examina- 
tion of the data for all 78 cases, it is appro- 
priate to report the latter only. 

Sex difference and success rating. Based on 
the data of Table 1, the point-biserial corre- 
lation between sex and success rating was 
— .02. This clearly nonsignificant result fails 


Table 2 


Means and Standard Deviations of Success Ratings for 
N = 78, Divided into 10 Groups by 
Number offint erviews 


Success rating 
Number of 


interviews n Mean SD 
1-5 16 3.00 2.09 
6-13 20 5.35 1.90 
14-21 9 3.11 1.45 
22-29 7 6.43 1.72 
30-37 7 5.43 1.29 
38-45 4 5.50 87 
46-53 3 7.00 82 
54-61 4 6.00 2.35 
62-69 4 7.00 71 
70-77+ 4 6.25 2.05 


to support Seeman’s finding that females tend 
to be rated as more successful than males. 

Sex difference and length of therapy. The 
median number of interviews for the 78 cli- 
ents was 15.5. From Table 3, it is seen that 
there is a significant tendency for males to 
take longer in therapy than females. An esti- 
mate of the degree of contingency was made 
using the phi coefficient, with a result of phi 
= .23. Though significant, only a small pro- 
portion of the variance in length of therapy 
would seem to be attributable to the sex of 
the client. 

Sex difference and age. The point-biserial 
correlation between sex and age was — .07, a 
nonsignificani result. 

Sex and status difference. There is no sig- 
nificant contingency between the variables of 
sex and status in the present sample. Chi 
square was 2.43 (p > .10). 


Table 3 


Fourfold Table Showing the Relation Between Sex 
and Length of Therapy 
(N = 78) 


Number of interviews 


Below At or above 
Sex median median 
Male 18 27 
Female 21 12 








360 


Age and success. The product-moment cor- 
relation between age and rated success was 
.16, which is not significant at the .10 level 
of probability. This result supported Seeman’s 
finding, and his interpretation that, within 
the age limits of the sample (18-43), the age 
of the client is unrelated to his degree of rated 
success in therapy. 

Age and length of therapy. The point-bi- 
serial correlation between age and number of 
interviews, dichotomized at the median, was 
.14, a result not significant at the .10 level 
of probability. 

Age and status difference. From Table 1 
it seems possible that students tend to be 
younger than nonstudents in the present sam- 
ple. No doubt this is obviously true in the 
general population. The point-biserial corre- 
lation between age and status was .27, a re- 
sult significant at better than the .01 level of 
probability. The present sample would seem 
to have been drawn from the general popula- 
tion in respect to age and status. It has been 
seen already that age appears to be unrelated 
to success. The relation between status and 
success will be reported next. 

Status difference and success. The point- 
biserial correlation between status and rated 


Desmond S. 





Cartwright 


success was — .22, significant at better than 
the .05 level, showing that students tend to 
be rated as more successful in therapy than 
nonstudents. From Table 1, it is seen that 
the mean success rating for students was 1.00 
point higher than that for nonstudents. How- 
ever, from Table 1 it is also seen that stu- 
dents had nearly twice as many interviews, 
on the average, as nonstudents. The success 
difference between the two groups may have 
been related to the difference in length of 
therapy. 

Status difference and length of therapy. 
There was, however, no significant relation 
between the variables of status and number 
of interviews (chi square = .22, p> .10). 
The computation was repeated using the 
mean as point of dichotomy for number of 
interviews, and a similarly nonsignificant re- 
sult was obtained. The large SD’s shown in 
Table 1 for the interview variable make it 
clear that the difference between the status 
groups in mean length of therapy is unre- 
liable. Length of therapy does not account 
for the success difference between the groups; 
rather, for reasons unknown, students tend to 
be rated as more successful than nonstudents. 

Length of therapy and success. For the pur- 














y 
—- 
< 
m% 
y) 
a5 4 
UO 
UO 
~ 
~ 
2 
3 
' 
' 
' 
' 
' 
ot 
7 T 7 J ~ t 7 , a 
0 3.0 9.5 17.5 25.5 33.5 41.5 49.5 57.5 655 73.5 


MEAN NUMBER OF INTERVIEWS 


Fig. 1. Rated success as a function of number of interviews. 




















Therapy and Certain Actuarial Variables 361 





MEAN SUCCESS RATING 











- 


T 


0 2 5 


MEAN NUMBER 


T T T 


8 1] 14 17 
OF INTERVIEWS 


Fig. 2. Rated success as a function of number of interviews within the short therapy group 


pose of examining the relation between length 
of therapy and rated success, a preliminary 
graphic representation of the data contained 
in Table 2 was made. From Figure 1 it would 
seem that, for the longer cases, a nonlinear 
relation exists between number of interviews 
and rated success. Omitting the 36 clients 
with less than 14 interviews, the correlation 
ratio of success rating on number of inter- 
views for the remaining 42 clients was eta = 
.67, a value significant at better than the .01 
level. 

It seemed possible too, from Figure 1, that 
the shorter cases might also display a non- 
linear relation between rated success and 
number of interviews. Accordingly, the group 
of 44 clients having 18 or fewer interviews 
was plotted in a similar graph, in Figure 2. 
All subgroups by mean number of interviews, 
in Figure 2, have m of 4 or greater. 

It is evident from Figure 2 that the shorter 
cases show a curvilinear relation between 
number of interviews and rated success. For 
N = 44, the correlation ratio of success rat- 
ing on number of interviews was eta = .66, a 
result significant at better than the .01 level. 

It is noticeable from Figure 1 that the mean 
success rating of clients whose mean number 


of interviews is 17.5 is little higher than the 
mean success rating for clients whose mean 
number of interviews is 3.0. This general re- 
sult is made clearer in Figure 2. There seems 
to be a “failure zone” in therapy, ranging 
around the 17-interviews point. This possi- 
bility was examined by taking the 24 clients 
with length of therapy ranging between 10 
and 30 interviews, and again plotting mean 
success rating against mean number of inter- 
views, as shown in Figure 3. 

The data in Figure 3 seem to support the 
possibility that there is a “failure zone.” 
Taking the lower end of the interval having 
14 interviews as the mean, and the upper end 
of the interval having 20 interviews as the 
mean, the range achieved is from 13 to 21 in- 
terviews. Ten clients fall within this range. 
On both sides of this range are groups of 
relatively more successful clients. Among the 
short-case clients, the seven who had a mean 
number of 11 interviews (Figure 3) had a 
mean success rating of 5.86. Among the long- 
case group, the seven clients having numbers 
of interviews ranging from 22 to 30 had a 
mean success rating of 6.43. The mean suc- 
cess rating for those who fell in the “failure 
zone” was 3.20. The difference between the 








Desmond S. Cartwright 

















7 
13 21 
' | 
6 "7 i ' 
oO ie—— FAILURE — | 
' i 
e 
! ' 
< i ' 
. 
WA ! 
Tp) 
[x) ' 
oO ! 
8 
n 4 4 ; 
Zz 
< 
ud) 
2 
2 a 
' 
' 
' 
! 
' 
O Bawe a eae — + r : r T 
11 14 17 20 23 26 29 
MEAN NUMBER OF INTERVIEWS 
Fig. 3. Failure zone between short and long therapy. 


mean success rating of the seven snort-case 
clients and the mean of the 10 failure-zone 
clients was tested for significance, and the 
resulting ¢ was 2.86 (p < .01). The failure- 
zone clients were similarly compared with the 
seven long-case clients, yielding a ¢ of 4.05 
(p < .01). The difference between the mean 
success ratings for the seven short-case and 
the seven long-case clients was not significant 
(¢ = .51). These results are unambiguously 
consistent with a hypothesis that there is a 
“failure zone” in therapy, ranging between 
13 and 21 interviews. 

It might be questioned whether this failure 
zone arises as result of individual differences 
in clients, or as a result of factors due to 
therapists. It might be, for example, that the 
10 failure-zone clients were all seen by one or 
two therapists, but the data do not support 
such a hypothesis. In all, the 24 clients of 
Figure 3 were seen by 13 different therapists. 
The 10 clients in the failure zone were seen 
by seven different therapists. The seven cli- 
ents falling in the relatively successful zone 
just prior to the failure zone were seen by six 
different therapists. The seven clients falling 
in the zone just after the failure zone were 
seen by seven different therapists. Five of the 


therapists who saw clients in the failure zone 
also saw clients in one or another of the ad- 
jacent relatively successful zones. From these 
findings, it seems possible to conclude that the 
factors responsible for the presence of the 
failure zone may be ascribed to the clients 
rather than to the therapists. 


Discussion 


The findings about the relation between 
success ratings and length of therapy seem 
to suggest that certain individual differences 
between clients give rise to different kinds of 
therapeutic process. Two kinds of process 
may be immediately identified: “short” (1- 
12 interviews) and “long” (13-77 interviews). 

A first hypothesis in regard to the two 
therapies is that they differ in the kind of 
problems brought in by the client. It is pos- 
sible that short-case clients had mainly situa- 
tional problems, while long-case clients had 
mainly personality problems. The assignment 
of the failure zone to the beginning of long 
therapy rather than to the end of short 
therapy would seem consistent with this hy- 
pothesis. 

On this view, however, it would seem rea- 
sonable to suppose that the presence of the 


owe eer SS 


\- 





Therapy and Certain Actuarial Variables 363 


failure zone is connected with some factor or 
factors which block the client from continuing 
with long therapy until a rating of relative 
success is possible. Such a factor could per- 
haps be a discovery of deeply threatening as- 
pects of self, or the anticipation of such dis- 
coveries if therapy is continued. On such a 
view, the failure zone would appear to be a 
drastic behavioral manifestation of resistance. 
If such a view were confirmed by further re- 
search, it would raise seriously the question 
of whether or not client-centered therapists 
should modify their approach to therapy to 
include the possibility of temporarily direc- 
tive behavior during the critical failure zone 
when it seems likely that a client will leave. 
An argument against such a position is that 
only the client can be aware of just how 
much anxiety his organism can tolerate at any 
particular point in time, and that his own 
motivation can be relied upon to bring him 
back to therapy as soon as he can tolerate 
the anticipated level of anxiety. 

The questions that have been raised can be 
satisfactorily answered only by empirical in- 
vestigation. As a first approach, it would seem 
of the utmost importance: (@) to examine by 
protocol analysis and all possible test meas- 
ures the individual differences between short 
and long therapy clients, and (5) to do the 
same for failure-zone clients, and (c) com- 
pare these with clients rated as relatively 
successful in both short and long therapy. In 
regard to the question of therapist behavior 
during the critical failure zone, it would seem 
that an empirical answer might be given by 
careful follow-up studies. If the client-cen- 
tered position is correct, then it might be ex- 
pected that a majority of the clients who 
leave therapy at some point during the fail- 
ure zone will return to therapy of some kind 
at some later date. 


Summary 


Counselor ratings of success in client-cen- 
tered psychotherapy for 78 clients were ex- 
amined in relation to variables of sex, age, 
student vs. nonstudent status, and length of 
therapy. It was found that neither sex nor 
age were significantly related to degree of 
rated success. Students were somewhat more 
successful than nonstudents, but the reasons 
for the difference are not known. The relation 


between length of therapy and success rating 
was complex, with the total sample falling 
into two groups. One group was composed of 
short-case clients, the other group of long- 
case clients. Within each group there was a 
strong positive relation between number of 
interviews and success rating. A “failure zone”’ 
ranging around 17.5 interviews was inter- 
preted as a period during which potentially 
long-case clients dropped out of therapy. The 
presence of this failure zone was discussed in 
terms of its implications for the theory and 
practice of client-centered therapy. 


Received February 2, 1955. 


References 


1. Carr, A. C. An evaluation of nine nondirective 
psychotherapy cases by means of the Ror- 
schach. J. consult. Psychol., 1949, 13, 196-205. 

. Dymond, Rosalind F. An adjustment score for 0 
sorts. J. consult. Psychol., 1953, 17, 339-342 

3. Dymond, Rosalind F. Adjustment changes over 
therapy from Thematic Apperception Test 
ratings. In C. R. Rogers & Rosalind F. Dy- 
mond (Eds.), Psychotherapy and personality 
change. Chicago: Univer. of Chicago Press, 
1954. Pp. 109-120. 

4. Gordon, T., & Cartwright, D. The effect of 
psychotherapy upon certain attitudes toward 
others. In C. R. Rogers & Rosalind F. Dy- 
mond (Eds.), Psychotherapy and personality 
change. Chicago: Univer. of 
1954. Pp. 167-195 

. Jonietz, Alice K. A study of the phenomenologi- 
cal changes in perception after psychotherapy 
as exhibited in the content of Rorschach per- 
cepts. Unpublished doctor’s dissertation, Uni- 
ver. of Chicago, 1950 

6. Muench, G. A. An evaluation of nondirective 
psychotherapy by means of the Rorschach 
and other tests. Appl. Psychol. Monogr., 1947, 
No. 13. 

. Raskin, N. J. Analysis of six parallel studies of 
the therapeutic process. J. consult. Psychol 
1949, 13, 206-220. 

8. Rogers, C. R. Changes in the maturity of be- 
havior as related to therapy. In C. R. Rogers 
& Rosalind F. Dymond (Eds.), Psychotherapy 
and personality change. Chicago: Univer. of 

Chicago Press, 1954. Pp. 215-237. 

9. Seeman, J. Counselor judgments of therapeutic 
process and outcome. In C. R. Rogers & Rosa- 
lind F. Dymond (Eds.), Psychotherapy and 
personality change. Chicago: Univer. of Chi- 
cago Press, 1954. Pp. 99-108. 

0. Vargas, M. J. Changes in self-awareness during 
client-centered therapy. In C. R. Rogers & 
Rosalind F. Dymond (Eds.), Psychotherapy 
and personality change. Chicago: Univer. of 
Chicago Press, 1954. Pp. 145-166. 


nN 


Chicago Press, 


wv 


~“ 


_ 





Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 





Perceived Parental Attitudes,’ the Self, and Security’ 


Sidney M. Jourard and Richard M. Remy 


Emory University 


A number of personality theorists have as- 
serted that an individual’s attitudes toward 
his own personality, or self, are acquired in 
some way from “significant others’—parents, 
teachers, peers, etc. As these others define and 
evaluate the person, so will he come to define 
and evaluate himself. Thus, Sullivan states 
“The self may be said to be made up of re- 
flected appraisals. If these were chiefly de- 
rogatory ... then the self dynamism will 
itself be chiefly derogatory ... it will en- 
tertain disparaging and hostile appraisals of 
itself” (5, p. 10). In another context he 
states “It is, therefore, the parents and sig- 
nificant others, brothers, sisters, or nurse, who 
determine the nature of the self-dynamism. 
. . . [The self] tends very strongly to main- 
tain the direction and characteristics given to 
it in childhood” (5, p. 131). Murphy formu- 
lates the supposed relation between parental 
appraisals and self-appraisals with testable 
specificity: “The tendency to value rather 
than disvalue the self is correlated with pa- 
rental approval .. .” (3, p. 522). A direct 
test of these propositions would entail a com- 
parison of parents’ attitudes toward their 
children with the children’s self-evaluations. 
A less direct but closely related approach 
would consist in the determination of the 
child’s beliefs concerning how his parents 
evaluate him, and then the comparison of 
these beliefs with self-evaluations. The latter 
approach would leave open the question of 
the validity of the beliefs. 

A problem related to parental- and self- 
evaluation is security. Personal security may 
be defined as the belief that one is adequate 
to handle life problems, and that one is well 
liked both by oneself and by significant 


1 The data on which this paper is based were taken 
from a thesis conducted under the supervision of the 
senior author, and submitted by Mr. Remy to the 
Graduate School of Emory University in partial ful- 
fillment of the requirements for the degree of Mas- 
ter of Arts. 


] 


others. According to this definition, we would 
expect that a person who believes that his 
parents evaluate him positively, and who 
evaluates himself positively, would be secure. 

The present study was concerned with ex- 
ploring these hypothesized relationships among 
parental attitudes, self attitudes, and security. 
The attitudes of a person toward his body 
and self were compared to his concept of how 
his parents evaluated his body and self, and 
a study was made of the relation between 
these two factors and a measure of security. 


Method 


Hypotheses. 1. A person’s attitude toward 
his body varies with his beliefs concerning 
the attitude of his parents toward his body. 

2. A person’s attitude toward his self varies 
with his beliefs concerning his parents’ atti- 
tudes toward his self. 

3. The subjects (Ss) who believe their par- 
ents hold negative attitudes toward their (Ss’) 
bodies show signs of insecurity; conversely, 
Ss who believe their parents have positive 
attitudes toward their bodies show signs of 
security. 

4. Subjects who believe that their parents 
dislike many of their (Ss’) traits of self show 
signs of insecurity, and vice versa. 

Materials. A 40-item body-cathexis scale 
and a 40-item self-cathexis scale were the 
basic materials in the study. The scales, a re- 
vision of a form described elsewhere (4), list 
aspects of the body, such as chest, height, 
face, and traits of self such as temper, in- 
telligence level, and ability to accept criticism. 
Each § signified his attitude toward each of 
these items in accordance with the following 
five-point scale: 


Have 
Have 
Have 
Have 
Have 


strong positive feelings 
moderate positive feelings 

no feeling one way or the other 
moderate negative feelings 
strong negative feelings 


mkt wd 


64 











‘¥ 


p> 





Perceived Parental Attitudes, the Self, and Security 


Table 1 


Intercorrelations Among BC-SC Scores, Perceived Parental Cathexes, and Security Scores 








Self-rating 


Mother-ratings 


Perceived Perceived 


Father-ratings 





* Significant at the .05 level of confidence. 
** Significant at the .01 level of confidence. 


Total scores for the inventory concerned 
with parts of the body are described as Body- 
Cathexis (BC) Scores; the score for the in- 
ventory which lists traits of self is called the 
Self-Cathexis (SC) Score. 

In addition to the BC-SC scales, each S 
filled out two duplicate scales; one with the 
instruction to signify how he believed his 
mother felt about each aspect of his body and 
self, and one with the instruction to signify 
how he believed his father felt about these 
items. The relevant totals are described as 
Perceived Attitudes of Mother (Father) To- 
ward My Body (Self), and are symbolized as 
follows: pMb, pFb, pMs, pFs. 

Finally, the Maslow Test of Psychological 
Security-Insecurity (Mas) (2) was adminis- 
tered to each S. 

Subjects. Ninety-nine undergraduate stu- 
dents at Emory University constituted the 
sample—51 females and 48 males. The age 
range of the group extended from 18 to 28 
years, with a mean age of 21.5, and an SD of 
2.5. About two-thirds of the Ss were drawn 
from classes in introductory psychology and 
mental hygiene, and the remainder were se- 
lected at random from the various dormitories 
on the campus. 

Procedure. Each S completed four scales— 
three identical BC-SC forms prefaced by dif- 
ferent instructions, and the Maslow scale. 
The testing was conducted on three separate 
days. On Day 1, Ss completed the BC-SC 
scale on the basis of their own feelings, and 
also the Maslow scale. On Day 2, the BC-SC 
scale was administered with the instruction 


Test BC SC pMb Ms pFb pFs 
BC .68** .68** .56** 
.84** 74" 68** 
SC 77** 66** 
70** 65** 
Mas - 50** — .66** — .57** 63** 42** 49** 
—.37* — .63** — .32* —.37* 33" 37* 
Note.—The upper r in each cell is for females (N = 51); the lower r is for males (\ 48). 


to respond as they believed their mothers felt. 
On Day 3, the Ss filled out the BC-SC scale 
as they believed their fathers felt. 

Results 

Reliability of the scales. Split-half reliabili- 
ties, corrected by the Spearman-Brown for- 
mula, were computed for the cathexis scales. 
All the coefficients were above .91, signifying 
satisfactory reliability. 

Correlations were computed among the 
cathexis scales and the Maslow scale, and 
are shown in Table 1. 

Body-cathexis and self-cathexis. The BC 
scores correlated .68 with SC scores for 
women, and .84 for men. This finding is con- 
sistent with the earlier finding of Secord and 
Jourard (4, p. 345), although the present r’s 
are somewhat higher. 

Body-cathexis and parental cathexes. For 
both sexes, BC correlated significantly with 
perceived mother- and father-cathexes of the 
Ss’ bodies. The relevant r’s are .68 and .74 
between BC and pMb for females and males 
respectively, and .56 and .68 between BC and 
pFb for females and males. These findings 
support the hypothesis that body-cathexis 
varies with perceived parental attitudes to- 
ward the Ss bodies. 

Self-cathexis and parental cathexes. The 
predicted relationship was upheld by signifi- 
cant correlations for both sexes between SC 
and pMs and pFs. For females the relevant 
r’s were .77 and .66, and for males, .70 and 
65. These results, and the ones above, show 
clearly that valuations of one’s body and per- 


366 Sidney M. Jourard and Richard M. Remy 


sonality covary with beliefs concerning how 
one’s mother and father evaluate these ob- 
jects. 

Self-cathexes, parental cathexes, and se- 
curity. It was hypothesized that persons who 
evaluated their bodies and personalities nega- 
tively, and who believed that their parents 
evaluated these objects in similar fashion, 
would be insecure. The Maslow scale was 
used as an independent measure of security 
in the present study. The bottom row of cor- 
relations in Table 1 support this hypothesis. 
For women, all of the cathexis variables were 
correlated with the Maslow scores with sig- 
nificance levels beyond .01. For men, the cor- 
relations were lower than for women, but all 
reached the .05 level, and one exceeded the 
.01 level of significance. It should be stated 
that the male-female difference between each 
pair of r’s was not statistically significant. 


Discussion 


The data showed that self-rated cathexes 
for the body and the self, perceived parental 
cathexes of the body and self, and a measure 
of personal security, were intercorrelated to 
statistically significant degrees. The correla- 
tions, though substantial, do not signify per- 
fect covariance among these sets of factors. 
It is evident that some of the variance in 
self-rated cathexes is a function of variables 
other than the parental cathexes, and that 
only a small portion of the variance in the 
security scores can be accounted for by the 
self-rated and parental cathexes of the body 
and the self. 

The correlations which were obtained, how- 
ever, are consistent with the predictions 
drawn from the writings of Sullivan and 
Murphy; yet they do not constitute proof of 
their assertions. It must be remembered that 
the present study did not deal with the real 
attitudes of the Ss’ parents, but only with the 
Ss’ beliefs concerning how the parents would 
or did evaluate their children’s bodies and 
personalities. These beliefs could be autistic 
as well as veridical—the present data do not 
enable us to distinguish which is the case. 
Nevertheless, it may be argued that Ss never 
deal with reality as such, but rather with an 
assumptive reality. If it is indeed true that 
self-evaluations are determined by parental 
evaluations of one’s self, then it follows that 
if a person believes that his parents approve 


of his traits, even though this belief be false, 
he will tend to approve of his traits as well. 
Our data support this formulation. 

The correlations between the cathexis scales 
and the Maslow security inventory suggest 
that all of these factors comprise what Mas- 
low has termed a “personality syndrome” 

. “a structured organized complex of 
apparently diverse specificities (behaviors, 
thoughts, impulses to action, perceptions, 
etc.) which, however, when studied carefully 
and validly are found to have a common 
unity that may be phrased variously as a 
similar dynamic meaning, expression, flavor, 
function, or purpose” (1, p. 32). The name 
that one gives to a syndrome is arbitrary, de- 
pending perhaps on one’s point of departure 
in computing correlations. Thus, we might 
speak here of the “security syndrome,” or of 
the “body-cathexis syndrome.” What we are 
basically describing is a correlation matrix. 


Summary and Conclusions 


Self-rated cathexes for the body and the 
self were found to be significantly correlated 
with perceived parental cathexes. Both ca- 
thexes also correlated significantly with scores 
on the Maslow Test of Psychological Se- 
curity-Insecurity. 

It may be concluded that self-appraisals 
covary with a person’s perception or belief 
concerning his parents’ appraisals of him; 
whether the self-appraisals vary with the real 
parental appraisals of the individual is still 
an open question, but one which could be in- 
vestigated empirically. 

Finally, negative self-appraisal, and per- 
ceived negative parental appraisals of the 
body and self, are correlates of psychological 
insecurity. 


Received January 31, 1955. 


References 


1. Maslow, A. H. Motivation and personality. New 
York: Harper, 1954. 

2. Maslow, A. H., Hirsh, E., Stein, M., & Honig- 
mann, I. A clinically derived test for measur- 
ing psychological security-insecurity. J. gen. 
Psychol., 1945, 33, 21-42. 

3. Murphy, G. Personality: a biosocial approach to 
origins and structure. New York: Harper, 1947. 

4. Secord, P. F., & Jourard, S. M. The appraisal of 
body-cathexis: body-cathexis and the self. J. 
consult. Psychol., 1953, 17, 343-347. 

5. Sullivan, H. S. Conceptions of modern psychiatry. 
Washington: William Alanson White Psychi- 
atric Foundation, 1947. 




















Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 





Children’s Responses to Human and Animal 
Stories and Pictures 


Nancy A. Boyd and George Mandler 


Harvard University 


Psychoanalytic theories of personality de- 
velopment and the projective test literature 
have frequently dealt with children’s reac- 
tions to animal as opposed to human figures. 
While increasing similarity between identifier 
and identified presumably increases identifi- 
cation, Freud pointed out in the case of Little 
Hans that children do not stress the gulf be- 
tween the human and the animal world (7). 
Bellak and Bellak (2) state that “children 
will frequently identify more readily with ani- 
mals than with human figures. .. .” How- 
ever, Goldfarb has suggested that this prone- 
ness to animal identification goes through 
progressively weakening stages with increas- 
ing age (8). Thus, one generally accepted hy- 
pothesis states that children tend to identify 
as much, or more, with animal figures than 
with human figures, and that this process of 
animal identification decreases with age. 

A second hypothesis stated by Blum (5) is 
that animal characters (i.e., stimuli) “facili- 
tate freedom of personal expression in situa- 
tions where human figures might provoke an 
unduly inhibiting resistance. . . .” It is pri- 
marily this hypothesis which has led to the 
development of projective tests that use ani- 
mal rather than human figures. 

Another area of investigation relevant to 
the imaginative productions of children is re- 
lated to the type of story and reading inate- 
rial with which the child is habitually con- 
fronted. Preschool children’s stories are typi- 
cally dominated by animal figures. On the 
other hand, Child, Potter, and Levine (6) 
found that third-grade readers contained al- 
most three times more human figures in every- 
day situations than any other type of story 
character. Human characters were most fre- 


367 


quently depicted in favorable situations, while 
animal and fairy figures were most often por- 
trayed in undesirable roles. Thus, on the basis 
of the child’s experiences with story material, 
we would expect his stories to display an in- 
creasingly human content as the child grows 
older, with a tendency for socially undesir- 
able roles to remain identified with animal 
characters. 

The experimental evidence bearing upon 
these hypotheses is contradictory. Bills (4) 
presents findings which suggest that children 
between the ages of five and ten give more 
lengthy story productions to animal than to 
human pictures. But Light (9), comparing 
CAT and TAT pictures, found that fourth 
and fifth graders showed more evidence of 
“identification” (e.g., expression of feeling, 
variations in themes) with human than with 
animal pictures. Armstrong (1) also finds 
that human pictures evoked more verbs, 
nouns, and ego-related words in first, second, 
and third graders. Finally, Biersdorf and 
Marcuse (3), using six of Bellak’s CAT pic- 
tures matched with six pictures identical in 
scene and situation but using human figures, 
found no significant differences in story 
length and number of ideas and characters 
introduced by their subjects (Ss) (first grade 
children). 

In the present study we have extended the 
problem to investigate not only the Ss’ differ- 
ential reactions to human and animal stimu- 
lus pictures, but also the Ss’ reactions to more 
structured stimulus material, i.e., human and 
animal stories. By presenting Ss with both 
stimulus stories and pictures, it was hoped to 
gain a more extensive picture of children’s re- 
action to human and animal figures. Third- 





368 


grade children were chosen, mainly for their 
ability to write their own stories. This was 
feasible since the hypotheses mentioned above 
are presumed to be valid for children from 
three to ten years of age. 


Method 


The basic design of the study involved 
eight different treatment groups (cf. Table 1). 
All Ss were exposed to the same procedure; 
they were told two stories, and following each 
of these stimulus stories they were presented 
with a stimulus picture in response to which 
they were required to write their own stories. 


Subjects 


The Ss were 96 third-graders from two pub- 
lic elementary schools. Twenty-four Ss came 
from each of four separate classes. More Ss 
were originally tested, but some were dis- 
carded at random in order to equate the num- 
ber of Ss in each experimental group. There 
was no significant difference between classes 
in their Kuhlman-Anderson IQ scores (Mean 
= 101.3) or age (Mean = 8 years, 5 months). 
The Ss came from two classes in each of 
the two schools that are four blocks apart, 
and draw pupils from similar socioeconomic 
groups. Pupil assignment to these four class- 
rooms is not based on any known variables. 


Design 


Table 1 shows the eight treatment cells 
evolved from a 2 X 2 X 2 factorial design 
based on three factors: (a) type of stimulus 
story, (2) content of stimulus story, and (c) 
type of stimulus picture. 

A total of eight stimulus stories, four human 
and four animal, and eight complementary 
pictures were used. Four story plots were con- 
structed in such a way that either child or 
animal characters could be used without 
changing scene, situation or events, resulting 
in eight stories. Two of these plots were char- 
acterized as of “good” content, two as of 
“bad” content. The “good” stories depict the 
main figure engaging in socially approved be- 
havior, which was usually rewarded in the 
story. The “bad” stories pictured socially dis- 
approved behavior. In order to control for 
any possible sex differences, half of the stories 
had a male, and half a female main character. 





Nancy A. Boyd and George Mandler 


Table 1 


Design of the Experiment 


SchoolA SchoolB 


SchoolA School B 





Pictures ClassI ClassII ClassIII Class IV 
Animal stories Human stories 
Good Bad Good Bad 
Animal Storya Storyc Story a’ Storyc’ 
Pictures Picturea Picturec Picturea Picture c 
Storyb Storyd Story b’ Story d’ 
Picture b Pictured Picture b Picture c 
Human Storya  Storyc Story a’ Story c’ 
Pictures Picturea’ Picture c’ Picture a’ Picture c’ 
Story b Story d Story b’ Story d’ 


Picture b’ Picture d’ Pictureb’ Pictured’ 


Note.— 


ntents and events except for main characters are 
identical for all stories and pictures with the same letter desig 
nations; prime designation indicates change in character from 
animal to human V = 12 in each cell. 


The four basic stimulus pictures, following 
Rotter’s (10) and Symond’s (11) criteria, 
showed some ambiguous action involving the 
main characters of the stories with a mini- 
mum of detail. Each of these pictures had 
two versions, one containing animal, the other 
human figures, with identical backgrounds. 

Each of the four classes was told two 
stimulus stories. The pictures were mimeo- 
graphed and assembled into booklets with 
space provided for responses to the pictures. 
By random assignment, half of the children 
in each class were given pictures with animal 
characters, and half were given pictures with 
human characters. Two of the classes were 
told animal stories, two were told human 
stories; two were told “good” stories, and 
two “bad” stories. 


Procedure 


The same experimenter conducted the study 
in all four classes. She explained to the chil- 
dren that she wanted to find out what kind of 
stories boys and girls liked best, that she had 
written some stories which she would read to 
them, and that she would like them to write 
some stories for her. The children seemed well 
motivated and eager to cooperate. On the first 
page of the booklets the Ss wrote their names 
and answered questions concerning story pref- 
erences, and two short questions referring to 





Responses to Animal and Human Stories and Pictures 


values covered by the stories. Then the first 
story was told, 5-10 minutes in length. After 
answering two short questions concerning 
their like or dislike of the story told, the chil- 
dren were given 15 minutes to write their 
own stories about the first picture in the 
booklet. The Ss were requested to examine 
the picture carefully, to try to imagine what 
had happened before, what was going on now, 
and what would occur in the future, and to 
write a story about the picture. They were 
told that spelling and punctuation were un- 
importaiut, and that the experimenter (£) 
was most interested in their ideas and 
thoughts about what was happening in the 
picture. The same procedure was then fol- 
lowed for the second story. 


Analysis of Data 


The stories of the 96 Ss were analyzed in 
reference to the following eight variables. 
Each of these measures is presumed to relate 
to personal involvement. 

1. Story length. This measure consisted of 
the number of words contained in the stories. 

2. Original ideas. An original idea was de- 
fined as a phrase consisting of a subject and 
a predicate that was not stated or implied in 
the story told by the E. 

3. Value judgments. A value judgment was 
defined as an expression of approbation or 
disapproval toward the hero of the story. 

4. Punishment. This measure was defined 
as an occurrence of an event unpleasant to 
the hero, in consequence of some action in 
which he has engaged either deliberately or 
inadvertently. 

5. Reward. Reward was defined as the oc- 
currence of an event pleasant to the hero as 
a consequence of a commendable action or 
attitude. 

6. Occurrence of “I.” Number of occur- 
rences of the pronoun “I.” 

7. New theme. This measure was defined 
as the presence or absence of divergences 
from the main theme of the story. 

8. Formal features. This measure refers to 
the number of words used for formal begin- 
nings and conclusions to the stories, i.e., title 
and “The end.” 

Measures 1, 6, and 8 are objective “count- 
ing” indices. The correlation for two judges 


369 


for Measure 2 was .89. Measures 3, 4, and 5 
resulted in score discrepancies of 1 or less for 
80 per cent of the Ss on Measure 3, and 98 
per cent of the Ss on Measures 3 and 4. The 
two judges also agreed on 92 per cent of 
the stories on Measure 7. An analysis of the 
measures where scoring discrepancies oc- 
curred showed no significant differences for 
cell totals between the two judges. 


Results and Discussion 
Story Preferences 


Seventy-four per cent of all Ss expressed a 
preference for animal stories over human 
stories. No significant differences were found 
in respect to this variable between the eight 
treatment groups. 


Influence of Stimulus Stories 


Measures 1 and 2 (Story Length and Origi 
nal Ideas) were subjected to an analysis of 
variance, while all others were more appro 
priately analyzed by chi-square statistics 
Table 2 shows the confidence levels for all 
experimental variables and interactions. 

Human stimulus stories showed a signifi 
cantly greater effect on the subsequent sul 
jects’ stories than did animal stimulus stories 
In seven out of eight measures the human 
stories produced significantly larger values. It 
appears that our Ss, while exposed primarily 
to human material in their reading matte! 
(6), prefer animal stories, but still tend to be 
more involved by human stories. 


Influence of Stimulus Pictures 


Animal stimulus pictures exerted a signifi- 
cantly greater influence than human stimulus 
pictures for two of the eight measures. Fou 
of the remaining six measures also showed a 
nonsignificant effect in the same direction 
Thus the response to the pictures parallels 
the Ss’ expressed preferences but tends to run 
counter to their reaction to the stimulus 
stories. Our data support the contention that 
animal stimulus pictures tend to elicit more 
emotional material. 


Interaction Between Stimulus Stories and 
Pictures 
The nature of the interaction between 


Stories and Pictures, which was significant in 








370 


Nancy A. Boyd and George Mandler 


Table 2 


Levels of Confidence for the Eight Dependent Variables 








1 2 


Measures 


4 5 6 7 8 


Value Punish- New Formal 
Source of variation Length Ideas judgments ment Reward _" theme features 

Stimulus stories 001 01 01 01 01 .001 
Pictures 05 01 
Content 01 07 
Stories X pictures 001 01 .001 001 
Stories X content .001 
Pictures X content 05 .001 
Stories X pictures X content : 05 01 

Note.—These values are derived from analyses of variance for Measures 1 and 2, and from chi-square statistics for the other 


measures. 


five out of eight measures, varies somewhat 
from measure to measure. For four of these 
measures, however, the general direction is 
the same. For Original Ideas, tendency to 
switch to a New Theme, presence of “I,” 
and Formal Features, there is a significantly 
greater incidence of these indices for those 
groups where the main characters switch from 
animal to human and vice versa. The cells 
where the main character remains the same 
show a smaller incidence than expected. The 
smallest of these deviations from expectancy 
usually occurs for the group that heard hu- 
man stimulus stories and was given animal 
pictures. The Story X Picture interaction for 
Value Judgments shows an effect in the op- 
posite direction, i.e., a significant deviation in 
the direction of increased numbers of value 
judgments when the main character stays the 
same. 

The fact that changes in the nature of the 
main character tend to lead to an increase 
in such indices as New Theme and Original 
Ideas is to be expected. If the character re- 
mains the same, the stimulus pictures are 
close enough to the stimulus stories so as not 
to evoke stories significantly different from 
the stimulus stories. 


Influence of the Content of the Stimulus 
Stories 


There is a significantly greater occurrence 
of punishment incidents in response to “bad” 
stimulus stories and a significantly more fre- 
quent appearance of the pronoun “I” in re- 


sponse to “good” stimulus stories. In addi- 
tion, the Picture x Content interaction shows 
an increase in the appearance of punishment 
incidents in response to human pictures fol- 
lowing “bad” stories, and in response to ani- 
mal pictures following “good” stories. Thus, 
the presence of socially disapproved behavior 
in the stimulus stories facilitates the expres- 
sion of negative attitudes toward the main 
character, particularly toward human char- 
acters. Punishment is expressed toward ani- 
mal figures without any previous arousal of 
negative affect. It should be noted that Value 
Judgments again show an effect in the op- 
posite direction, i.e., their incidence increases 
to human pictures following “good” stimulus 
pictures and to animal pictures following 
“bad” stimulus pictures. 

The interpretation of triple interactions is 
always difficult without previous predictions. 
It is interesting to note that all Ss introduced 
new themes in response to animal pictures 
following “good” human stimulus stories, 
while only 58 per cent of the Ss introduced 
new themes in response to animal pictures 
following “bad” human stimulus stories. 
Thus, the presence of anxiety-arousing ma- 
terial tended to inhibit the expression of 
originality in response to animal pictures. A 
similar effect was found for the Original 
Ideas measure. 

A surprising result was the highly signifi- 
cant effect of the experimental variables on 
the occurrence of formal features in the Ss’ 
productions. Such aspects of the children’s 


A —_ ate 


Responses to Animal and Human Stories and Pictures 371 


stories may be an expression of the Ss’ need 
to make the stories acceptable and conform 
to the usual type of story the child is ex- 
posed to. On the other hand, formal features 
may take the place of more original material 
when anxiety blocks the child from expressing 
his feelings. A combination of these two fac- 
tors probably accounts for the large effects 
found for this measure. 

Our results tend to support the hypothesis 
that animal pictures facilitate the expression 
of ego-involvement, particularly of negative 
affect. However, the overwhelming effect of 
human stimulus stories on the production of 
imaginative material fails to support a gen- 
eral theoretical assumption of children’s pri- 
mary identification with animals. In addition, 
there is evidence that stimulus stories with 
socially disapproved content led to the ex- 
pression of punishing attitudes, while socially 
approved content facilitated a greater involve- 
ment of the self in subsequent productions. 
The expressed preference for animal stories 
may be related to the fact that socially dis- 
approved behavior in animals is less anxiety- 
arousing than in human subjects. Thus, less 
punishment is expressed toward animals fol- 
lowing these anxiety-arousing stories, but the 
expression of value judgments is facilitated. 


Summary 


This study investigated the responses of 96 
third-grade children to stimulus stories and 
stimulus pictures, when the main characters 
were either human or animal. Following ex- 
posure to short stimulus stories, the Ss wrote 
15-minute stories in response to ambiguous 
pictures. These stories were analyzed in re- 
spect to productivity and other indices of ego- 
involvement. The main findings were: 

1. Stimulus stories with human characters 
elicit more involvement than animal stories. 

2. Animal pictures tend to elicit more origi- 
nal material than human pictures. 


3. Changes in the main character from 
stimulus stories to pictures resulted in greater 
involvement than when stories and pictures 
both had human (or animal) characters. 

4. Socially disapproved behavior in the 
stimulus stories elicits the expression of pun- 
ishment. 

5. Socially approved behavior in the stimu- 
lus stories elicits more projection of the self. 

6. Socially disapproved behavior by hu- 
man characters apparently arouses more anx- 
iety than such behavior by animal characters. 


Received February 9, 1955. 


References 


1. Armstrong, Mary A. S. Children’s responses to 
animal and human figures in thematic pic- 
tures. J. consult. Psychol., 1954, 18, 67-70 

2. Bellak, L., & Bellak, Sonia. An introductory 
note on the Children’s Apperception Test. J. 
proj. Tech., 1950, 14, 173-180 

3. Biersdorf, Kathryn, & Marcuse, F. L. Responses 
of children to human and animal pictures. J 
proj. Tech., 1953, 17, 455-459 

4. Bills, R. E. Animal pictures for obtaining chil 
dren’s projections. J. clin. Psychol., 1950, 6, 
291-293. 

5. Blum, G. S. A study of the psychoanalytic theory 
of psychosexual development. Genet. Psychol 
Monogr., 1949, 39, 3-99. 

6. Child, I. L., Potter, Elmer H., & Levine, Estelle 
M. Children’s textbooks and personality de 
velopment. Psychol. Monogr., 1946, 60, No. 3, 
1-54 (Whole No. 279). 

7. Freud, S. Inhibitions, symptoms and anxiety. 
London: Hogarth, 1936. 

8. Goldfarb, W. The animal symbol in the Ror- 
schach test and an animal association test. 
Rorschach Res. Exch., 1945, 9, 8-22 

9. Light, B. A comparative study of a series of 
TAT and CAT cards. J. clin. Psychol., 1954, 
10, 179-181. 

10. Rotter, J. B. Studies in the use and validity of 
the TAT with mentally disordered patients 
Charact. & Pers., 1940, 9, 8-34. 

11. Symonds, P. Criteria for the selection of pictures 
for the investigation of adolescent fantasies 
J. abnorm. soc. Psychol., 1939, 34, 271-274 





Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


The lowa Picture Interpretation Test: A Multiple- 
Choice Variation of the TAT’ 


John R. Hurley 


State University of Iowa? 


The Iowa Picture Interpretation Test 
(IPIT) is the product of an attempt to inte- 
grate the objective and quantitative advan- 
tages of traditional paper and pencil person- 
ality measures with the so-called “depth” of 
projective techniques. This report is a brief 
account of the IPIT’s methodology and pres- 
entation of its normative, reliability, and in- 
tercorrelational characteristics. Some implica- 
tions of these data are discussed. 


Development of the IPIT 


Interest in the IPIT was originally stimu- 
lated by the meed Achievement work of 
the Wesleyan group. McClelland, Atkinson, 
Clark, and Lowell (4) have found TAT re- 
sponses involving this facet of personality, 
as measured by trained raters employing a 
quasi-subjective scoring method, related to a 
wide variety of other behaviors. While their 
measuring technique may be granted to pos- 
sess established utility, both the special train- 
ing required to employ it competently and 
the appreciable interrater variability remain- 
ing between even highly skilled raters sug- 
gest the potential advantages of a more 
conventional and objective measure of this 
seemingly important personality character- 


1JIn part this article is an adaptation of a doc- 
toral dissertation in the Department of Psychology, 
State University of Iowa. It is wished to acknowl- 
edge the substantial contributions of Drs. J. M. 
Daily, I. E. Farber, W. G. Dahlstrom, L. D. Good- 
stein, and J. G. Smith to the initial IPIT version 
and the formulation of the response class definitions 
described in this paper by Drs. Farber and Good- 
stein. The author gratefully acknowledges the in- 
valuable guidance of Dr. Farber throughout all 
phases of the IPIT’s development. 

2 Now at Michigan State University. 


372 


istic. Construction of such a measure was one 
goal of the IPIT’s development. 

Adoption of a multiple-choice approach, 
however, made it feasible to devise a measure 
that would concurrently assess other person- 
ality factors. It was decided to offer four al- 
ternative choices to each of the following TAT 
cards: 1, 2, 4, 6BM, 7BM, 7GF, 8BM, 13, 
14, and 17BM. For each of these TAT cards 
sets of four alternative choices were formu- 
lated which consisted, in every case, of brief 
statements representing each of the following 
response classes (as examples the choices 
used with TAT cards 1 and 4 are given in 
brackets after their respective definitions) : 

a. Achievement Imagery. A person high in 
achievement imagery (AI) is one who by 
word or action habitually indicates a desire to 
compete successfully with a standard of ex- 
cellence. He indicates by word or act that 
successful competition with certain groups or 
individuals, or high accomplishment in terms 
of social standards, would be accompanied by 
feelings of success; he attempts, or verbalizes 
an interest in attempting, some unique ac- 
complishment that would imply personal suc- 
cess; or he indicates by word or action some 
long-term involvement of a sort that would 
imply anticipation of successful competition 
or goal achievement. (He is dreaming of the 
day when he will become a great musician. 
He is telling her he must leave home because 
opportunities are greater in the big city.) 

b. Insecurity. An insecure individual is one 
who by word or action indicates that he has 
failed or anticipates failure to attain a de- 
sired goal, named or implied. He verbalizes 
actual or anticipated personal experiences, 



















+s eit serine ceeirs aes 


wr ate 





feelings, or fears of deprivation or threat of 
deprivation of some positively valued goal, 
especially of a social nature, e.g., affection, 
esteem, security, etc. Individuals who respond 
to failure or anticipation of failure by aggres- 
sive acts or statements are specifically ex- 
cluded from this category. (He is afraid that 
he will never be able to play the violin well. 
He is telling her that he has just lost his job 
and has little hope of finding another.) 

c. Blandness. A bland individual is one who 
habitually depersonalizes situations or events. 
He acts or speaks in a manner implying lack 
of personal involvement. His self-references 
and references to others are guarded or non- 
committal with respect to the expression of 
feelings, moods, or motives. (His violin is on 
the table and he is waiting for his music les- 
son. He is going to look for another room be- 
cause her boarding house is full.) 

d. Hostility. A hostile person is one who 
habitually verbalizes feelings of annoyance, 
anger, or resentment. He acts, or verbalizes 
intentions or desires to act, in a punitive, 
threatening, or injurious manner 
others. (He is angry at his mother because 
she makes him practice while he’d rather be 
outside playing. He is telling her that she 
must enter an old-age home because he re- 
fuses to support her any longer.) 

A list of forty tentative choices (one rep- 
resenting each of the four described classes 
consistent with the content of each TAT 
card) and copies of these definitions were 
submitted to five clinical psychologists. These 
judges were requested to indicate which of 
the four defined response classes, if any, was 
most consistent with each choice. Only one 
choice was assigned by a majority of judges 
to a different classification than it had been 
devised for. One or two judges classified seven 
choices as representing a different class than 
that for which they had been intended. Five 
of these eight discrepant choices were altered 
in a manner intended to improve consistency 
with their respective definitions. The three 
remaining were all AI choices assigned by the 
same judge to the Insecurity classification, or 
vice versa. In the opinion of the IPIT con- 
Structors these disagreements were due to un- 
warranted inferences concerning the “under- 





toward — 








The Iowa Picture Interpretation Test 373 


lying” motives, so these choices were not 
altered. 

This utilization of experts’ ratings, com- 
bined with modification of nondiscriminating 
choices in the initial IPIT version produced 
a significant (p < .05) increment in the AI 
score internal consistency coefficient of the 
revision used to collect the present data. 


Administration and Scoring 


The revised IPIT was administered to 
groups, ranging in size from 15 to 65, of un- 
dergraduate students in introductory psychol- 
ogy courses at the State University of Lowa. 
After providing Ss with IPIT instructions 
which included the example described by 
Goodstein (1) and sets of the alternative 
choices,* lantern slide reproductions of the 
ten TAT cards were projected onto a screen 
in a semidarkened room. Each slide was ex- 
posed for 50 seconds while Ss ranked, from 1 
to 4, the appropriate choices according to 
their personal preferences. 

Scores representing Ss’ over-all preferences 
for the four choice classes were derived by 
summing the ranks assigned by each S to the 
ten choices of each response class. Thus, the 
lower the numerical score, the higher the 
over-all preference expressed for the corre- 
sponding class of choices. 


Normative Data 


IPIT score means and SD’s for 455 Iowa 
Ss less than 25 years of age are given in 
Table 1. These data show that the AI choices 
received the highest preference rating, fol- 
lowed, in that order, by Insecurity, Bland- 
ness, and Hostile choice classes. All prefer- 
ence rank differences between these classes 
were significant (p < .001) by the ¢ test. 

The Insecurity scores revealed the only sig- 
nificant (p < .05) sex mean score difference, 
with the females expressing the higher pref- 
erence for these responses. The Hostility and 


8A full account of the IPIT instructions and sets 
of alternative responses has been deposited with the 
American Documentation Institute. Order Document 
No. 4668 from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress, Wash- 
ington 25, D. C., remitting in advance $1.25 for mi- 
crofilm and $1.25 for photocopies. Make checks pay- 
able to Chief, Photoduplication Service, Library of 
Congress. 


374 


Table 1 


Means, SD’s, and Between-Sex ¢ Values of Scores on 
the Iowa Picture Interpretation Test 











Sex diff. 
IPIT Score N Mean SD t 
Achievement Imagery 
Men 213 20.92 3.72 0.60 
Women 242 20.71 3.66 y 
Both Sexes 455 20.80 3.69 
Insecurity 
Men 213 «24.15 2.13 
Women 242 23.62 3.25 20% 
Both Sexes 455 23.87 2.80 
Blandness 
Men 213 2688 3.47 0.54 
Women S06. 273.06 .. 400.:.'0" 
Both Sexes 455 26.97 3.54 
Hostility 
Men 23.23 30 1.164 
Women 242 28.62 4.05 Bir 
Both Sexes 455 28.44 3.66 





*» < .05 for two-tailed test. 
7¢ values computed by the 
method, 








Cochran-Cox approximation 


Insecurity scores of female Ss were also sig- 
nificantly (p < .01) more variable than those 
of males. 


Reliability 


A close rank-order relationship obtains be- 
tween the IPIT score test-retest and internal 
consistency coefficients. Based upon the scores 
of 100 Michigan State College introductory 
psychology students, the test-retest coeffi- 
cients, over a six-week retest interval, were: 
AI .52, Insecurity .46, Blandness .57, and 
Hostility .54. 

Owing to the few items determining IPIT 
scores, special attention was given to the se- 
lection of the response-class subsets required 
to ascertain the internal consistency coeffi- 
cients. The values reported were derived from 
r’s between the 5-choice subgroups from each 
response class which yielded the most nearly 
equivalent subgroup total scores. Employing 
the Spearman-Brown attenuation correction 
the full-length IPIT score internal consistency 
coefficients, based upon 200 Iowa Ss, were: 
AI .34, Insecurity .15, Blandness .46, and 
Hostility .35. 





John R. Hurley 


Table 2 


Intercorrelations of Scores on the Iowa Picture 
Interpretation Test 











(N = 455) 
Insecurity Blandness Hostility 
(I) (B) (H) 
Al —.23 —.16 — .66 
[.24*] [.29**]  [--.57**] 
I — .56 —.05 
[—.45** ] [.33**] 
B — 35 
[—.12] 
*p < .01 for two-tailed test. 
** » < .001 for two-tailed test. 
Intercorrelations 


Product-moment r’s between IPIT scores 
are given in Table 2. Owing to nonindepend- 
ence of the ranks assigned the alternative 
choices for any TAT card, an intrinsic cor- 
relation having an expected value of — .33 
obtains between all IPIT scores.* To facili- 
tate conventional interpretation of these data 
a correction for this intrinsic component of 
the empirical intercorrelations was computed. 
This was effectuated by subtracting the ex- 
pected value of the intrinsic covariance 
from the empirical values. These “corrected” 
values, enclosed by brackets in Table 2, rep- 
resent the direction and amount of covariance 
remaining after this operation. A lack of in- 
formation concerning the sampling distribu- 
tions of the intrinsic correlation complicates 
interpretation of these “corrected” values. 
However, there appears little basis to expect 
that this sampling distribution would radi- 
cally depart from normality, and employment 
of Fisher’s r to z transformation in evaluat- 
ing the significance of all corrected values 
serves to offset the consequences of moderate 
departures from normality. 


Discussion 


Strong evidence of the functional utility of 
the IPIT has been provided by a series of 


*This value follows from the general formula, 
—1/(m—1), where m represents the number of 
ranks employed. The writer wishes to thank Dr. 
Guy Stevenson of the Mathematics Department, 
University of Louisville, who developed this formula. 











yf 
of 


a, 
of 
yr. 
it, 








recent investigations in which relationships, 
predicted in advance, have been found be- 
tween IPIT scores and such diverse behaviors 
as frequency of nonsense syllable anticipa- 
tions (2), electric maze learning (3), and 
performance of an arithmetical task (6). 
Less direct validational evidence seems fur- 
nished by some of the present data. Consist- 
ent with the widely held view of clinical 
theorists is the positive relationship found 
between preferences for Insecurity and Hos- 
tility responses. The negative Al-Hostility 
correlation is in accord with a priori expecta- 
tions, based on the notion that degree of 
preference for AI responses might be an 
index of the strength of tendencies to con- 
form or subscribe to socially endorsed be- 
havioral patterns while Hostility scores might 
reflect the strength of tendencies to revolt 
against these patterns. The relative unpopu- 
larity of the Blandness responses, which 
ranked as less preferred than the clearly self- 
doubting Insecurity alternatives, permits the 
inference that Ss were not very guarded in 
their IPIT rankings. 

As compared with conventional paper-and- 
pencil personality tests the IPIT’s reliabilities 
seem objectionable. However, the AI score 
has approximately the same test-retest reli- 
ability as Morgan (5) and McClelland e¢ al. 
(4, p. 188) have reported for the need 
Achievement measure. Apparently the va- 
lidity of these semiprojective measures has 
been sufficient to partially compensate for 
moderate reliability. 

The low internal consistency coefficients in- 
dicate that the choices representing the four 
response classes were relatively heterogene- 
ous as viewed by Ss despite the high general 
agreement by the experts of definitive con- 
sistency. This heterogeneity may be partially 
attributable to the IPIT’s brevity and conse- 
quent small sampling of the choice popula- 
tions from the various response classes. At 
present a longer version of the IPIT is being 
studied in the Iowa laboratories which may 
appreciably improve its reliability character- 
istics. 

It may have been noted that both the IPIT 
AI and Insecurity response definitions include 
response classes which the McClelland group 
employ as components of their need Achieve- 





The lowa Picture Interpretation Test 375 


ment score. The nonsignificant AI sex differ- 
ence under the “neutral” conditions prevail- 
ing during the IPIT’s administration is at 
variance with the reports of McClelland and 
his co-workers of consistent sex differences in 
need Achievement scores under similar con- 
ditions. Perhaps this discrepancy is partially 
accountable to the finding that females ex- 
pressed a higher preference than males for 
Insecurity responses, which contain reference 
to expectations of unsuccessful competition 
with standards of excellence, as contrasted 
with the successful expectations expressed in 
the AI responses. 

The marked AI and Insecurity score inter- 
correlational differences constitute new evi- 
dence that anticipations of successful and 
unsuccessful outcomes of competition with 
standards of excellence are response classes 
possessing divergent behavioral correlates. 
Added to similar earlier evidence (2), these 
data indicate the advisability of a re-exami- 
nation of the position taken by McClelland 
and his colleagues that these two response 
classes may be satisfactorily used as equiva- 
lent components of the same personality 
characteristic. 

Further discussion of the relatively small 
IPIT sex differences appears premature until 
more information about the behavioral cor- 
relates of the scores disclosing sex differences 
becomes available. 

Many variations in the IPIT’s structure are 
possible, e.g., substitution of new response 
classes, employment of different stimuli, 
changes in the number of alternatives of- 
fered, etc. Currently being investigated is a 
scoring system which utilizes a 5-point rating 
scale instead of the forced-choice procedure. 

The predictive successes of this relatively 
unexploited new approach to personality 
measurement suggests that the IPIT holds 
unusual promise of developing important new 
tools for personality researchers. 


Summary 


Development of the Iowa Picture Inter- 
pretation Test (IPIT) which provides objec- 
tive scores on variables labeled Achievement 
Imagery (AI), Insecurity, Blandness, and 
Hostility, was described. Normative, reli- 
ability, and intercorrelational data based 





376 


upon 455 college students were discussed. 
Implications were drawn from the relation- 
ships observed between IPIT AI and Inse- 
curity scores for the need Achievement meas- 
ure developed by McClelland and his col- 
leagues. 

Possible IPIT modifications, including some 
being currently explored, were noted and it 
was suggested that these seem likely to de- 
velop into valuable new personality research 
tools. 

Received March 2, 1955. 


References 


1. Goodstein, L. D. Interrelationships among several 


measures of anxiety and hostility. J. consult. 
Psychol., 1954, 18, 35-39 


John R. 





Hurley 


sn 


. Morgan, H. H 


. Hurley, J. R. Performance in verbal learning as a 
function of instructions and achievement im- 
agery scores. Unpublished doctor’s dissertation, 
State Univer. of Iowa, 1953. 

. Johnston, R. A. 
agery 


t} 


LUCSIS, 


The effects of achievement im- 
on maze learning. Unpublished master’s 
State Univer. of Iowa, 1954. 
McClelland, D., Atkinson, J., Clark, R., & Lowell, 
E. The achievement motive. New York: Ap 
pleton-Century-Crofts, 1953. 
Measuring achievement motiva- 
tion with “picture interpretations.” J. consult. 
Psy 


hol., 1953, 17, 289-292. 

Williams, J. R. The effects on speed of perform- 
ance of interference tendencies, achievement 
im and mode of inducing failure. Un- 
published doctor’s dissertation, State Univer 
of Iowa, 1954. 











Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


The Discriminative Ability of the Blacky Pictures 
with Ulcer Patients 


Lewis Bernstein and Philip H. Chase * 


Veterans Administration Hospital, Denver, Colorado 


In a study by Blum and Kaufman (8) it 
is reported that the Blacky Pictures were ad- 
ministered to 14 ulcer patients, all of whom 
indicated strong disturbance on the dimen- 
sion of oral eroticism. However, on the in- 
quiry items for this dimension, half of these 
patients chose disturbed alternatives, while 
the other half picked neutral answers to the 
items. This split suggested to those investi- 
gators that their data were congruent with 
the two ulcer “types” described by Alexander 
and French (2) and Alexander (1). In a 
separate publication,. Blum and Hunt (7) 
offer these findings as partial evidence for the 
validity of the Blacky Pictures. 

In addition to the small number of cases 
from which these data were derived, there are 
other critical questions that immediately sug- 
gest themselves. No evidence is offered to in- 
dicate that the patients who chose the neu- 
tral inquiry items were diagnosed as reactive 
ulcer types by some independent criterion, or 
that those patients who chose disturbed in- 
quiry items were independently diagnosed as 
primary types. Therefore, could not the even 
split of the 14 patients on the inquiry items 
be regarded as a chance affair and as evidence 
for lack of validity? 

Blum and Kaufman have themselves pointed 
out that their control-group data are based on 
cases from a study by Aronson (3), and have 
not been matched with their ulcer sample. 
Furthermore, there is some reason to believe 
that Aronson’s control groups used in the 


1The senior author is responsible for the manu- 
script in its present form. The junior author col- 
lected the data as part of an M.A. thesis accepted by 
the Graduate School of the University of Colorado, 
1954. 


Blum and Kaufman study may also be among 
the groups from whose responses the scor- 
ing system was derived. It is stated in the 
scoring manual (6, p. 1) that among the 
groups were thirty paranoid schizophrenics 
and thirty nonparanoid schizophrenics—the 
precise make-up of two of Aronson’s groups 
If these two groups are Aronson’s, it 
follow that, because their con- 
tributed to the scoring system their 
scores on the test following the development 
of the scoring system arc not independent of 
the scoring criteria. On the other hand, Blum 
and Kaufman’s ulcer patients were not con- 
tributors to the criteria, and thus 
their independent scores would not be strictly 
comparable to the control groups. 

These considerations suggest the need for 
cross validation of Blum and Kaufman’s find- 
ings. The need is recognized by Blum and 
Hunt when, in discussing the obtained sig- 
nificant differences in Blum’s original mono- 
graph, they state: “. . . all [significant dif- 
ferences} must be considered to have only a 
tentative sort of status until cross-validatin 
studies, on other samples, reveal 
ferences and correlations survive the vicissi 
tudes of sampling and random error, and 
which do not” (7, p. 248). The purpose of 
this study, then, was to add to the cross- 
validational evidence on the Blacky Pictures 

Method 

The Blacky Pictures were administered to 
three groups, each composed of twenty male 
patients hospitalized at the Denver VA Hos- 
pital. 

The Ulcer patients were selected according 
to their availability, and all were patients 


would 
responses 


used, 


scoring 


which dif- 


377 





378 Lewis Bernstein and Philip H. Chase 


whose peptic ulcers had been definitely dem- 
onstrated by accepted medical procedures, 
and whose diagnoses were uncomplicated by 
any additional disorder of a psychosomatic 
nature. 

The Psychosomatic, Nonulcer patients were 
also selected according to their availability. 
All patients in this group showed no symp- 
tomatology of a gastrointestinal nature. This 
group consisted of 11 cases of bronchial 
asthma, 5 cases of neurodermatitis, and 4 
cases of essential hypertension. The psycho- 
somatic diagnosis was, in each case, the pa- 
tient’s major diagnosis. 

The Nonpsychosomatic patients were again 
selected according to their availability. These 
patients had diagnoses of a strictly organic 
nature, as follows: fractures, 8 cases; carci- 
nomas, 3 cases; simple lesions and gunshot 
wounds, 3 cases; inguinal hernias, 2 cases; 
amputation, 1 case; chronic osteomyelitis, 1 
case; contracture, 1 case; cholecystitis, 1 
case. If any psychosomatic symptoms were 
noted, the patient was not accepted for test- 


Table 1 


Distribution of Age, Marital Status, and Occupational 
Status of All Groups 








Group 





Psycho- Non- 
somatic, psycho- 





Variable Ulcer nonulcer somatic 
Age: 
20-35 7 7 7 
36-45 7 7 7 
46-65 6 6 6 
Marital status: 
Married 12 12 12 
Single + a + 
Other (divorced, widowed, 
or separated) 4 4 4 
Occupational status: 
Professional and mana- 
gerial; semiprofessional; 
official 2 2 2 
Clerical and sales 3 3 3 
Service 1 1 1 
Agricultural 2 2 2 
Skilled labor 6 6 6 
Semiskilled labor 5 5 5 
Unskilled labor 1 1 1 





Note.—Occupational status categories were modified from 
the Dictionary of Occupational Titles, I: Definitions of Titles 
(10). 


ing, his “doubtful” status being confirmed by 
his ward physician. 

The patients in the three groups were 
roughly matched for age, marital status, and 
occupational status. The distributions of the 
three groups on these variables are shown in 
Table 1. 

All patients were administered the Blacky 
Pictures according to standard administration 
procedures * (4, 5), usually in groups of from 
two to five. In several instances, an indi- 
vidual administration was given to a bedrid- 
den patient. Following administration, each 
protocol was coded, placed in a general pool 
and, at a later date, each was withdrawn at 
random and scored without reference to the 
clinical group from which it came. The scor- 
ing method used was that of Blum (6). 

Several short scoring reliability studies were 
carried out. There were two disagreements 
out of 325 possible disagreements between 
the authors of this paper on five randomly 
selected protocols independently scored. There 
was one disagreement in scoring between one 
of us (P. H. C.) and a clinical graduate stu- 
dent, on two randomly selected protocols 
(130 possible disagreements). Blum kindly 
arranged for one of his associates, Winter,® 
to score six of our protocols (two from each 
of our groups). There were eight disagree- 
ments between our scoring and that of Winter 
(390 possible disagreements). Thus, we felt 
that our scoring was consistent with that of 
Blum. 

Chi square was the statistic used in the 
present study. In order that a defensible as- 
sumption could be made concerning the dis- 
tribution of differences among the clinical 
groups, the data were first changed from the 
over-all dimension scores to difference scores 


2 Blum (private communication) has called to our 
attention that in our record booklets our subjects 
had all of the inquiry items on one dimension avail- 
able at one time, whereas his own practice was to 
show one item after another, not allowing the sub- 
ject to look back or ahead. Although we were able 
to watch our subjects carefully in the small group 
administrations, and observed no obvious looking 
back or ahead, this still remains a departure from 
Blum’s standard procedure. 

3 We wish to express our appreciation to Drs. 
Gerald S. Blum and William D. Winter for their 
assistance. 











bs 


Baim 


Ya 





for each pair of matched patients in the two 
groups under investigation at any one time. 
A frequency count was then made with refer- 
ence to the first group of a pair, in terms of 
whether the scores for the first group were 
higher, the same as, or lower than those of 
the comparison group. For example, if the 
ulcer patient of a pair received a score of ++ 
on a given dimension, and the other mem- 
ber of the pair (psychosomatic, nonulcer or 
nonpsychosomatic) received a score of O for 
the dimension, the tally was placed in the 
“Higher” column, etc. 


Results 


Table 2 shows the chi-square values for the 
hypothesis of only chance differences between 
expected and observed occurrences. It will be 
noted that, although each intergroup com- 
parison yields significant differences on 3 out 
of the 17 variables, there are no significant 
differences on the dimension of oral eroticism. 

While Blum and Kaufman’s ulcer group all 
gave “strong” oral stories to the first oral 
cartoon, only half of our ulcer group pro- 
duced “strong” stories to this card. As in the 
Blum and Kaufman study, our ulcer patients 
who obtained strong story scores split into 
two groups—roughly half obtaining strong 
inquiry scores, and the other half obtaining 
weak-or-none scores. However, when the 
group who did not obtain strong story scores 
were analyzed, they also split into two halves 
—one obtaining strong inquiry scores, and 
the other not! These findings would appear 
to indicate chance occurrences only. Thus, 
our ulcer group appears dissimilar to the 
Blum and Kaufman ulcer group on the di- 
mension considered most important by those 
investigators. 


Discussion 


This study sought to cross-validate the find- 
ings of Blum and Kaufman that the Blacky 
Pictures could differentiate the strong oral- 
dependent conflicts of ulcer patients from 
other disorders. The keystone of the Blum 
and Kaufman study was the differentiation 
of their ulcer group from their three control 
groups on the dimension of oral eroticism. 
Our findings on the oral eroticism dimension 
did not differentiate the ulcer group from 





Blacky Pictures with Peptic Ulcer Patients 


Table 2 


Chi-square Values for the Hypothesis of Only Chance 
Differences Between Groups 


Chi-square values 


Psycho 
somatic, 





Ulcer vs. Ulcer nonulcer 

Psycho- vs. Non- vs. Non- 

somatic, psycho- psycho- 

Dimension nonulcer somatic somatik 
Oral eroticism 3 1.2 1.9 
Oral sadism 4.4 10.8** 4 

(U)t 
Anal expulsiveness 6.7* 2.7 1.1 
(U) 

Anal retentiveness 10.8** A 14.7** 

(PNI (PNU) 
Oedipal intensity 4.1 1.1 5.1 
Masturbatory guilt 4.1 1.2 1.2 

Castration anxiety A 2.7 7.6*t 
Castration seeking 8 1.9 8 
Identification process 4 3.6 1.6 
Mother decisive figure 7.5* 11.2** 1.1 

(U) (U) 

Father decisive figure 4.8 9.9** 6.8" 

(NP) (PNU 
Desires father identi- 
fication 3.6 1.2 1.1 
Sibling rivalry 1.9 3.4 1.1 
Guilt feelings A 3 4 
Positive ego identifi- 

cation 3 48 48 
Narcissistic love object 3.6 3 
Anaclitic love object 3.6 1.6 3 


* Significant at the .05 level of confidence. 

** Significant at the .01 level of confidence 

+ The letter after the chi-square values refers to the group 
Ulcer (U), Psychosomatic, Nonulcer (PNU), or Nonpsycho 
somatic (NP)—which scored the stronger on the dimension 

t The direction of significance cannot be determined, since 
the greatest frequency fell in the “Same” cell. 


either a psychosomatic, nonulcer group or 
from a nonpsychosomatic group. Shortly after 
our data had been collected, Streitfeld (9) 
published a study in which he contrasted 
twenty ulcer patients with twenty patients 
having nongastrointestinal psychosomatic dis- 
orders. Although testing an hypothesis differ- 
ent from our own, his operations were similar 
to ours. He, too, found that the oral eroticism 
card did not discriminate his ulcer group from 
his control group. 

Although these findings cast some doubt on 
the validity of the Blacky Pictures—espe- 
cially the card designed to measure oral eroti- 
cism—there may be alternative explanations 


380 


for the variation of our findings from those of 
Blum and Kaufman. For example, it may be 
that our ulcer group contained patients whose 
oral passivity had been satisfied in an ac- 
ceptable way at the time of testing, perhaps 
because of their dependent status as patients. 

It is possible that our ulcer group differed 
in socioeconomic status from that of Blum 
and Kaufman. Their cases were part of a 
group being studied by a psychoanalyst and, 
on this basis, may represent a higher socio- 
economic group than our patients. On the 
other hand, we may well question whether 
their study by a psychoanalyst had sensitized 
them to their oral needs. 

Certain drawbacks inherent in our own 
study must be considered. Because of the 
relatively small size of our sample of pa- 
tients, and of the limited matching procedures 
we used, validation on other samples and un- 
der other conditions must be recommended. 
However, Streitfeld’s findings (9) lend some 
corroboration to our own. 


Summary 


This study was an attempt to cross-validate 
the findings of Blum and Kaufman with re- 
spect to the responses of ulcer patients to the 
Blacky Pictures. The Blacky Pictures were 
administered to three groups of hospitalized 
patients: an ulcer group; a psychosomatic, 
nonulcer group; and a nonpsychosomatic 
group. Although there were significant differ- 
ences on 3 of 17 dimensions for each inter- 
group comparison, we could not differentiate 
our group on the basis of oral eroticism— 
the dimension considered most important by 
Blum and Kaufman. Nor did our data lend 





Lewis Bernstein and Philip H. Chase 


themselves to a meaningful differentiation of 
our ulcer patients into “primary” and “re- 
active” groups, as postulated by Blum and 
Kaufman. 

Although our findings cast some doubt on 
the validity of the Blacky Pictures for dis- 
criminating ulcer patients from other pa- 
tients, alternative explanations are considered. 


Received January 28, 1955. 


References 


1. Alexander, F. Psychosomatic medicine. New 
York: Norton, 1950. 

2. Alexander, F., & French, T. M. Studies in psy- 
chosomatic medicine. New York: Ronald, 
1948. 

3. Aronson, M. L. A study of the Freudian theory 
of paranoia by means of the Blacky Pictures. 
J. proj. Tech., 1953, 17, 3-19. 

4. Blum, G. S. A study of the psychoanalytic theory 
of psychosexual development. Genet. Psychol. 
Monogr., 1949, 39, 3-99. 

. Blum, G. S. The Blacky Pictures: A technique 
for the exploration of personality dynamics. 
New York: Psychological Corp., 1950. 

6. Blum, G. S. Revised scoring system for research 
use of the Blacky Pictures. Ann Arbor: Uni- 
ver. Michigan, Dept. Psychol., 1951. (Mimeo- 
graphed.) 

. Blum, G. S., & Hunt, H. F. The validity of the 
3lacky Pictures. Psychol. Bull., 1952, 49, 238- 
250. 

8. Blum, G. S., & Kaufman, Jewel B. Two patterns 
of personality dynamics in male ulcer pa- 
tients, as suggested by responses to the Blacky 
Pictures. J. clin. Psychol., 1952, 8, 273-278. 

9. Streitfeld, H. S. Specificity of peptic ulcer to in- 
tense oral conflicts. Psychosom. Med., 1954, 
16, 315-326. 


wm 


“I 


10. U. S. Department of Labor. Dictionary of occu- 
pational titles. Part I, Definitions of titles. 
Washington: U. S. Government Printing Of- 
fice, 1939. 











Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Evidence for the Validity of the Children’s Form of 
the Picture-Frustration Study’ 


Eugene E. Levitt * and William H. Lyle, Jr.’ 


Child Welfare Research Station, University of lowa 


The children’s form of the Rosenzweig Pic- 
ture-Frustration Study, a duplicate of the 
adult form in structure and intent, was first 
published in 1948 (8). Evidence for validity 
in the original publication consisted only of 
the presence of expected genetic trends in 
the direction of aggression. Over the age 
span of 4 to 13 years, extrapunitiveness (2) 
decreased, intropunitiveness (J) and impuni- 
tiveness (M/) increased. The Group Con- 
formity Rating (GCR) also showed a pre- 
dicted increase with age. In a later study by 
Rosenzweig and Rosenzweig (9), a group of 
child clinic patients was found to differ from 
the normative sample in expected ways. The 
patients had a higher percentage of E, a 
lower percentage of J, and a consistently 
lower GCR at all age levels. The conclusion 
that the child patients “have less fully ac- 
quired the patterns of socially acceptable re- 
action to frustration” (9, p. 686) appears 
warranted by the data and reflects favorably 
on the validity of the test. 

A comparative study of a delinquent and 
a normal adolescent group by Lindzey and 
Goldwyn (5) found that the normals had a 
significantly higher mean EZ, but that there 
was no difference in the GCR. Within the de- 
linquent group, those with poor institutional 
adjustment had a higher Z than those with 
good adjustment. It is somewhat difficult to 
reconcile these findings. However, the study 
itself has a number of weak points, including 


1This investigation was supported by Research 
Grant MH-301 from the National Institute of Men- 
tal Health of the National Institutes of Health, U. S. 
Public Health Service. 

2 Now at Chicago Institute for Juvenile Research. 


% Now at Southern Illinois University, Carbon- 
dale, Ill. 


the selection of subjects, control methods, 
and statistical procedures.’ At best, it can be 
regarded as inconclusive. 

An opportunity for an additional investiga- 
tion of the validity of the children’s form of 
the P-F was provided by the development of 
a verbal test of punitiveness in children (6). 
The Problem Situations Test (PST) is a mul- 
tiple-choice scale in which the subject is pre- 
sented with a series of hypothetical situations 
involving misbehaviors of children, and is re- 
quired to deal with each situation, either from 
a child’s point of view or from the point of 


2 While the factor of socioeconomic status appears 
to have been carefully controlled in the selection of 
subjects there are certain other possibilities of bias 
in selection. For example, the contrasting groups were 
not actually equated for intelligence. The control pro- 
cedure consisted merely of dropping out all subjects 
with an IQ of less than 85. This, of course, does not 
necessitate the conclusion that the groups had equiva- 
lent mean IQ’s. Nothing is said about how many 
subjects had to be dropped due to the criterion level. 
The mean E score of the delinquent group is 18% 
higher than Rosenzweig’s (8) normative group, and 
the nondelinquents are more than 27% higher. Data 
to be presented in the body of this paper, as well as 
those of Spache (11) and Rosenzweig and Rosen- 
zweig (9), are in close agreement with the results 
for the normative group. The nondelinquent group 
tested by Lindzey and Goldwyn is hence probably 
atypical. This is also reflected in the mean GCR’s 
which are nearly 15% lower than the standard for 
both delinquents and nondelinquents. The use of the 
one-tailed test of significance is questionable in some 
instances, and clearly improper in another. Lindzey 
and Goldwyn’s hypothesis states that the delinquent 
boys will be higher in ZE. When the nondelinquents 
have the higher mean £ it is impossible to justify 
the use of a one-tailed test. In addition, the vari- 
ances of the distributions of Z are higher than usual. 
If the variances are also heterogeneous, the differ- 
ence between means by the two-tailed test will not 
reach the .05 level. 


381 





382 


view of an adult authority figure. Responses 
are scored as punitive or nonpunitive. Norma- 
tive data for the PST and its significant re- 
lationships with authoritarianism in the child 
and with parental disciplinary procedures are 
described elsewhere (6). A recent study (4) 
has shown that the PST is also negatively re- 
lated to an awareness of the complex, vari- 
able nature of human motivation, and to the 
ability to withhold judgment of others in 
children of grade-school age. An investigation 
of the relationship between the PST and the 
P-F would cast further light on the validity 
of the latter. 


Subjects and Procedure 


The PST had been administered earlier to 
a group of 157 fifth-grade pupils. The mean 
PST score (number of punitive responses) 
was 6.28 with a sigma of 3.9. From this group, 
two subgroups were selected from the tails of 
the distribution. The 24 Highs had scores 
ranging from 9 to 15 with a mean of 11.17. 
The 28 Lows had a range from 0 to 3 with a 
mean of 2.04. The subgroups also differed sig- 
nificantly on an incomplete sentences blank 
designed to measure the child’s view of his 
parents’ disciplinary strictness. A high score 
on the ISB indicates that the child regards 
his parents as highly punitive. The mean ISB 
score for the Highs was 8.13, for the Lows, 
6.64. This difference is not surprising since 
the PST and the ISB have been shown to be 
related (6). 

The groups were chosen so that the mean 
IQ scores did not differ. This precaution was 
necessary since the PST has a significant cor- 
relation of — .29 with IQ, and nothing is 
known about the relationship of the chil- 
dren’s P-F to IQ. The mean IQ scores on the 
Otis Self-Administering Test, Intermediate 
Form, were 106.58 for the Highs and 106.32 
for the Lows. There were 14 males and 10 fe- 
males in the High group, and 13 males and 
15 females among the Lows. The slight dis- 
parity in the sex ratios was not regarded as 
important since earlier studies of the P-F (8, 
11) had found no sex differences. 

The P-F was administered to the 52 sub- 
jects in small groups about a year after the 
administration of the PST. There is no rea- 
son to believe that the intervening period 





Eugene E. Levitt and William H. Lyle, Jr. 


would affect test scores significantly one way 
or another. Such changes as might occur 
would seem likely to be minimal. 


Results: Comparisons of Highs and Lows 
with the Normative Group 


The protocols were analyzed for direction 
of aggression, E, J, and M, and for type of 
aggression, O-D (obstacle-dominant), E-D 
(ego-defensive), and N-P (need-persistent). 
The mean frequencies of direction of aggres- 
sion categories for the Highs and Lows are 
compared with Rosenzweig’s normative data 
(8) for the 12- to 13-year old group in 
Table 1. Variances for all comparisons are 
homogeneous. 


Table 1 


Comparisons of Highs, Lows, and Normative Group 
on Mean Frequencies of E, I, and M 














Group N E I M 
Highs 24 11.60 5.42 6.94 
Normative 77 9.72 7.08 7.20 
Lows 28 8.82 6.54 8.64 
t (H-N) 2.17* 4.09** 0.38 
t (L-N) 1.23 1.38 2.32t 





* Significant at the .05 level. 
** Significant at the .01 level. 
' tSignificant at the .02 level. 


From Table 1 we see that the Highs are 
significantly more extrapunitive than Rosen- 
zweig’s normative group, the respective means 
being 11.60 and 9.72, yielding a ¢ of 2.17. 
The Highs are also less intropunitive than 
the normative group, the means being 5.42 
and 7.08, t = 4.09. The Lows differ from the 
normative group only in being significantly 
more impunitive. The means are 8.64 and 
7.20 and the ¢ is 2.32. Differences between 
the Highs and Lows will be discussed in the 
next section. 

The corresponding data for type of aggres- 
sion are shown in Table 2. 

The data of Table 2 show that the Highs 
manifest more obstacle-dominance than the 
normative group. The means are 4.04 and 
3.84 and the ¢ is 2.02. The Lows are less ego- 
defensive than the normative group (11.64 
and 13.18, ¢ = 3.12) and more need-persist- 
ent (8.05 and 7.08, ¢ = 2.05). 

There are instances of heterogeneity of 











Validity of the Picture-Frustration Study 


Table 2 


Comparisons of Highs, Lows, and Normative Group 
on Mean Frequencies of O-D, E-D, and N-P 











Group 0-D E-D N-P 
Highs 4.04 13.35 6.56 
Normative 3.84 13.18 7.08 
Lows 4.27 11.64 8.05 
t (H-N) 2.02* 0.25 2.36t 
t (L-N) 1.08 3.12 2.05* 





* Significant at the .05 level. 
** Significant at the .01 level. 


t Variances not homogeneous; the appropriate ¢ for the .05 
level is 2.06 (see text). 


variance in the comparisons of frequencies of 
type of aggression. The variance ratio for the 
distributions of O-D for the normative and 
Low groups is 1.75, which is significant at the 
.05 level. The distributions of both E-D and 
N-P for the normative and High groups are 
heterogeneous, the ratios being 1.99 and 2.02. 
In each of the three instances the normative 
group is less variable. Because of the hetero- 
geneous variances, the probability level for 
the comparison of mean frequencies of N-P 
between the normative and High groups has 
to be adjusted, using the method proposed by 
Cochran and Cox (1). The adjusted .05 level 
is 2.06, which indicates that the means, 7.08 
for the normative group and 6.56 for the 
Highs, also differ significantly (¢# = 2.36). 
Differences in type of aggression between the 
Highs and Lows will again be discussed in 
the next section. 

The mean Group Conformity Rating for 
the Lows is 15.07 and 15.00 for the Highs 
based on 22 cartoons as specified by Rosen- 
zweig (8). The corresponding GCR percent- 
ages are 68.50 and 68.17, compared with a 
GCR for the normative group of 63.80. Vari- 
ances are homogeneous for the comparisons 
of means which yield #’s of 2.03 for the Lows 
and 2.22 for the Highs, both significant at 
the .05 level. 


Comparison of Highs and Lows 


The crucial comparisons bearing on the va- 
lidity of the P-F involve the Highs and Lows 
only. The various mean frequencies are listed 
in Tables 1 and 2 and will not be re-pre- 
sented in tabular form in this section. When- 


383 


ever possible, the difference between means 
will be analyzed by means of an F ratio so 
that the variance estimate can be based on a 
maximum degrees of freedom. 

For direction of aggression the respective 
distributions for the two groups have homo- 
geneous variances. The Highs have a mean 
E of 11.60, the Lows a mean of 8.82. The F 
is 7.83, which is significant below the .01 
level. The mean J frequency for the Lows is 
6.54, for the Highs, 5.42. The F is 4.74, 
which is significant below the .05 level.* The 
M means of 8.64 for the Lows and 6.94 for 
the Highs yield an F of 3.51, with a signifi- 
cance level of < .i0 > .05. 

The variances of the distributions of O-D, 
E-D, and N-P are all significantly different 
for the two groups. The respective ratios are 
2.16, 1.90, and 1.73. The Lows are more vari- 
able on O-D, the Highs more variable on the 
others. The mean O-D frequencies, 4.27 for 
the Lows, 4.04 for the Highs, do not differ 
(¢ = 0.51). However, the mean £-D fre- 
quencies, 13.35 for the Highs and 11.64 for 
the Lows, are significantly different. The ¢ is 
2.23 where the ¢ required for the .05 level is 
2.06 according to the Cochran-Cox formula. 
The mean N-P frequencies of 8.05 for the 
Lows and 6.56 for the Highs result in a ¢ of 
2.07 where the reformulated .05 
quirement is again 2.06. 

The mean GCR’s, 68.50% for the Lows 
and 68.17% for the Highs, do not differ sig- 
nificantly. However, the variances are once 
more heterogeneous, the ratio being 1.80, with 
the Lows being more variable. 


level re- 


8 Although the earlier studies show no sex differ- 
ences, the data were analyzed further to determine 
whether or not the significant F ratios could be due 
to the intergroup differences in sex proportions. The 
mean E frequencies were 10.16 for the females and 
10.02 for the males, which indicates that it is im- 
possible for the preponderance of the former in the 
Low group to have accounted for the difference in 
E between groups. The females had a mean 7 of 
6.38, compared to 5.69 for the males. The F is only 
1.74, P = .20. Furthermore, males in the two groups 
do not differ on J, the Highs having a mean of 5.40 
and Lows a mean of 5.43. We may conclude that the 
difference between groups on the 7 factor is due to 
the high mean 7 frequency of the females in the Low 
group (7.03) as compared with the females in the 
High group (5.96), but the difference is not due to 
sex, per se. 





384 


Additional Results 


Spache (10) has suggested that it may be 
revealing to break down the P-F data into re- 
sponses to the 13 cartoons in which the frus- 
trater is an adult and the 11 cartoons in which 
the frustrater is a peer. A comparison of the 
Highs and Lows on the basis of the break- 
down into A-C and C-C items was entirely 
negative. The data for direction of aggres- 
sion are shown in Table 3. 


Table 3 


Percentage of Direction of Aggression as a Function 
of the Frustrating Agent 


E I M 
Agent Agent Agent 


Group Adult Child Adult Child Adult Child 


44 
43 


Highs 45 55 72 28 56 
Lows 39 61 72 28 57 





None of the differences between the Highs 
and Lows is significant. The significance of 
the largest absolute discrepancy, that for EZ, 
does not quite reach the .10 level. 

The corresponding data for type of aggres- 
sion are listed in Table 4. 


Table 4 


Percentage of Type of Aggression as a Function 
of the Frustrating Agent 


O-D 


Agent 


N-P 
Agent Agent 


Group Adult Child Adult Child Adult Child 


Highs 2 13 “oun O8OC'R 
Lows 19 16 43 56 38 28 





Again, none of the differences is significant. 
A direction-by-type breakdown showing the 
percentages of type of aggression for each of 
the three directions of aggression is similarly 
unrevealing.* 


*The absence of differences between our groups 
using the Spache analysis does not necessarily bear 
on its clinical value. The analysis shows at least that 
children in both groups tend to respond more extra- 
punitively toward peers and more intropunitively to- 
ward adults in a frustrating situation. There are also 
indications that need-persistence and obstacle-domi- 





Eugene E. Levitt and William H. Lyle, Jr. 


A breakdown of the GCR’s also shows no 
intergroup differences. The percentages based 
on 11 cartoons of each category are as fol- 
lows: A-C: Highs, 65.73%, Lows, 70.46%; 
C-C: Highs, 70.64%, Lows, 66.55%. It is in- 
teresting that the over-all GCR breakdown is 
not in accord with Spache’s data. Of all the 
conforming responses, 50.06% are C-C and 
49.94% are A-C, while Spache shows a pre- 
ponderance in favor of the C-C category. 

It would be of interest to estimate the re- 
lationship between the P-F indices and in- 
telligence, information which has heretofore 
been lacking. Since it is evident that the High 
and Low groups are drawn from different 
populations as far as their performance on 
the P-F is concerned, it is improper to pool 
the two groups for any statistical analysis in- 
volving P-F measurements. Correlations be- 
tween E, J, and M, and IQ scores on the Otis 
were therefore computed separately for each 
group. The coefficients are shown in Table 5. 
Significance of differences between groups was 
computed by means of Fisher’s z. 


Table 5 


Correlations Between the P-F and IQ 


Group E I M GCR 
Highs — 29 .22 25 —.24 
Lows 17 BT ag — .55** .38* 
Lait 1.58 1.28 2.97** 2.17* 

* Significant at the .05 level. 

** Significant at the .01 level. 

Since the coefficients for the Highs and 


Lows on E and J do not differ significantly, 
we can average them using Fisher’s z on the 
assumption that they were obtained from 
equally correlated populations. The average r 
for E is — .05, which is not significant. The 
average r for J is .41, significant below the 
.01 level. These averages may be regarded as 
estimates of the relationship over the com- 
plete range of E and J scores. 


Discussion 


The data indicate that both the Highs and 
the Lows differ from Rosenzweig’s normative 


mance are more common types of reaction to adults 
and that ego-defensiveness is more typical of the re- 
action 'to peers. These findings agree approximately 
with those of Spache (11). 








‘¥ 


5 ee, i ed 


iF 2] 





Validity of the Picture-Frustration Study 385 


group in direction and type of aggression. 
The differences are largely in the direction to 
be expected if the PST and the P-F are re- 
lated measures. Groups selected from the ex- 
tremes of distribution x should score higher 
and lower on test y than the mean of y for a 
complete sample if we are to reject the null 
hypothesis for the value of r,,. The differ- 
ences between our groups and the normative 
group indicate, therefore, that the PST and 
the P-F 4re correlated, and thus furnish in- 
direct evidence for the validity of the latter. 

The fact that both the Highs and the Lows 
have significantly higher GCR’s than the nor- 
mative group suggests the possibility that the 
GCR may be curvilinearly related to puni- 
tiveness. The variance of the distribution of 
GCR’s is abnormally low in both groups. For 
the mean GCR frequencies of 15.07 and 15.00 
the corresponding sigmas are 2.34 and 1.74. 
The relatively low variances indicate that the 
ability of the GCR to discriminate among in- 
dividuals is poor and that its internal con- 
sistency is probably inadequate. It is similar 
in this respect to the GCR derived from the 
adult form of the P-F. The Kuder-Richardson 
reliability of the adult form for a sample of 
130 adult males was — .04 (12). In the pres- 
ent study the comparable reliabilities are .14 
for the Lows and — .60 for the Highs. As 
Taylor and Taylor (12) point out, a negative 
reliability coefficient obtained by means of 
the Kuder-Richardson formula indicates that 
the true reliability probably does not differ 
significantly from zero. 

The results also clearly show that perform- 
ance on the PST is related to extrapunitive- 
ness, intropunitiveness, ego-defensiveness, and 
need-persistence as measured by the P-F. The 
PST is also related to all three types of ag- 
gression as far as variability goes, those scor- 
ing low on the PST being more variable in 
obstacle-dominance and less variable in ego- 
defensiveness and need-persistence. Specifi- 
cally, extrapunitiveness (i.e., punitiveness) 
on the PST is related to extrapunitiveness 
and ego-defensiveness on the P-F, and non- 
punitiveness on the PST is correspondingly 
related to intropunitiveness and need-persist- 
ence on the P-F. It is not possible to respond 
intropunitively on the PST; the nonpunitive 
response does not correspond directly to any 


scoring category of the P-F. However, in the 
sense that intropunitiveness and impunitive- 
ness are not outgoing punitive responses, we 
would expect them to be related to nonpuni- 
tiveness on the PST, as opposed to the puni- 
tive PST response, which is clearly extra- 
punitive in nature. 

In an analysis of both direction and type 
of aggression it is evident that the scoring 
will not be independent. Each trichotomy has 
only two degrees of freedom. When two cate- 
gories have been scored, the third is already 
determined. Rosenzweig himself (7) has 
pointed out this limitation and the fact that 
it raises questions of interpretation. 

It is possible that the solution to this prob- 
lem lies in a pattern analysis rather than a 
category analysis. For the purposes of the 
present study the important consideration is 
that the Highs are more extrapunitive and 
less intropunitive than the Lows. These find- 
ings are not affected by the interdependence 
of categories, since two of the three may be 
considered independent. 

The same consideration applies to the type 
of aggression, where again the results reflect 
favorably on the validity of the P-F. We 
would certainly expect that children raised in 
an authoritarian atmosphere—characteristic 
of the Highs, as the study by Lyle and Levitt 
(4) indicates—would manifest more ego-de- 
fensiveness. Ego-defensiveness occurs to a 
greater degree in younger children (8) and is 
hence more typical of the authoritarian child 
who has what Frenkel-Brunswik (2) calls 
“underdeveloped self-reliance.” In the same 
vein, the Highs are less need-persistent, a 
form of behavior which increases in fre- 
quency with age. There is no difference in 
obstacle-dominance, which is not subject to 
a developmental trend. 

The differences in variability are difficult 
to interpret. It is likely that unknown factors 
such as religion, socioeconomic status, or in- 
trafamily relationships underlie these differ- 
ences. The heterogeneity of variance is a less 
complex matter in the case of the GCR’s 
where the means do not differ. The more 
democratic home permits the child to express 
a relatively large degree of nonconformity, or 
inclines him toward a greater degree of con- 
formity through identification with the benign 








386 Eugene E. Levitt and William H. Lyle, Jr. 


parents. The children of authoritarian homes 
cling closely to a fixed point of conformity in 
the interests of emotional security. 

The psychological significance of the cor- 
relations of intelligence and E, J, M, and 
GCR is somewhat difficult to assess because 
of the discreteness of the groups. Certainly 
they suggest that the P-F is not entirely in- 
dependent of IQ, a conclusion which also ap- 
pears applicable to the adult form, as the 
study by Karlin and Schwartz (3) indicates. 
The significant relationship with / is reason- 
able; it suggests that learning the culturally 
accepted, nonaggressive reaction is in part a 
function of intelligence. The peculiar sign re- 
versal of three of the four sets of coefficients 
for Highs and Lows, E, M, and GCR, hints 
at the possibility of curvilinear relationships. 
The potential relationship between IQ and 
the P-F further emphasizes the necessity for 
controlling the former in studies of the P-F 
and points up the probability of detrimertal 
effects of inadequate controlling, as in the 
Lindzey-Goldwyn study (5). 


Summary 


Twenty-four high and twenty-eight low 
scorers on the Problem Situations Test, a 
verbal measure of punitiveness in children, 
were administered the children’s form of the 
Rosenzweig Picture-Frustration Study. Ma- 
jor findings in general reflect favorably on 
the validity of the P-F; the Highs on the 
PST gave significantly more extrapunitive 
responses and significantly fewer intropuni- 
tive responses on the P-F than the Lows. 
In addition, the Highs were more frequently 
ego-defensive in their responses, and less 
need-persistent. Both groups also showed ex- 
pected differences from Rosenzweig’s norma- 
tive group. Group Conformity Ratings did not 
differ for the two groups and were so limited 
in variability as to raise serious questions of 
the utility of the GCR as an index. Break- 
down of responses according to the frustrat- 
ing agent disclosed no significant differences 
between groups. There is some indication that 


performance on the P-F is related to intel- 
ligence (average r= .41 between IQ and 
intropunitiveness). The nature of the data 
suggests that EZ, M, and GCR may be curvi- 
linearly related to IQ. Finally, the lack of 
homogeneity of variance between groups for 
a number of P-F measures suggests the op- 
eration of presently unidentifiable factors un- 
derlying performance on the P-F. 


Received April 19, 1955. 
Early publication. 


References 


1. Cochran, W. G., & Cox, Gertrude M. Experi- 
mental designs. New York: Wiley, 1950. 

2. Frenkel-Brunswik, Else. Interaction of psycho- 
logical and sociological factors in political be- 
havior. Amer. polit. Sci. Rev., 1952, 46, 44-65. 

3. Karlin, L., & Schwartz, M. M. Social and general 
intelligence and performance on the Rosen- 
zweig Picture-Frustration Study. J. consult. 
Psychol., 1953, 17, 293-296. 

4. Levitt, E. E. Punitiveness, “causality,” and in- 
telligence of elementary school children. J. 
educ. Psychol., in press. 

5. Lindzey, G., & Goldwyn, R. M. Validity of the 
Rosenzweig Picture-Frustration Study. J. Pers., 
1954, 22, 519-547. 

6. Lyle, W. H., & Levitt, E. E. Punitiveness, au- 
thoritarianism and parental discipline of grade 
school children. J. abnorm. soc. Psychol., 1955, 
51, 42-46. 

. Rosenzweig, S. Some problems relating to re- 
search on the Rosenzweig Picture-Frustration 
Study. J. Pers., 1950, 18, 303-305. 

. Rosenzweig, S., Fleming, Edith E., & Rosen- 
zweig, Louise. The children’s form of the 
Rosenzweig Picture-Frustration Study. J. Psy- 
chol., 1948, 26, 141-191. 

9. Rosenzweig, S., & Rosenzweig, Louise. Aggres- 
sion in problem children and normals as 
evaluated by the Rosenzweig Picture-Frustra- 
tion Study. J. abnorm. soc. Psychol., 1952, 47, 
683-687. 

10. Spache, G. Differential scoring of the Rosen- 
zweig Picture-Frustration Study. J. clin. Psy- 
chol., 1950, 6, 406-408. 

11. Spache, G. Sex differences in the Rosenzweig 
P-F Study, children’s form. J. clin. Psychol, 

1951, 7, 235-238. 

12. Taylor, M. V., & Taylor, ©. M. Internal con- 
sistency of the Group Conformity Rating of 
the Rosenzweig Picture-Frustration Study. J. 
consult. Psychol., 1951, 15, 250-252. 


~ 


oo 


er ee 





yf 





Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Perceptual Tests and Acute and Chronic Status as 
Predictors of Improvement in Psychotic Patients’ 


Sylvia L. Sonder 


Austin State Hospital 


The prediction of outcome of illness in 
mental patients has long been of interest and 
importance to clinical psychologists. Attempts 
at prediction have met with partial success 
and with contradictory and paradoxical find- 
ings. Some studies (2, 4, 8, 9) have indicated 
that good performance in psychological tests 
is positively related to favorable prognosis. 
Others (1, 3, 5, 13) have shown that good 
performance is negatively related to outcome. 
Literature pertinent to this problem has been 
reviewed by Windle and Hamwi (12) and 
Zubin, Windle, and Hamwi (14). 

It occurred to Zubin, Windle, and Hamwi 
(14) that the inverse prognostic relationship 
might be a function of the chronic status of 
psychotic patients. Their studies indicated 
that for acute (early) patients, the relation- 
ship between test performance and outcome 
is positive and for chronic patients, negative. 
The following predictions were proposed for 
the present study: 

1. Acute (early) patients with good scores 
on psychological tests will improve, those 
with poor scores will not improve. 

2. Chronic patients with poor scores on 
psychological tests will improve, those with 
good scores will not improve. 


1 Adapted from a Ph.D. dissertation submitted to 
the Department of Psychology, University of Texas. 
Appreciation is expressed to Dr. Harry Helson and 
Dr. Wayne H. Holtzman for permitting the use of 
the special techniques they have developed and for 
their valuable suggestions; to Dr. Philip Worchel for 
his assistance and encouragement; to Mr. Charles H. 
McFadden for his cooperation and participation; to 
the Administration and Staff of the Austin State 
Hospital where the study was conducted; and to 
Dr. Frances E. Davis and Dr. Manual J. Otero, of 
the medical staff, for their participation in evaluat- 
ing the progress of the patients. 


38 


Method and Procedure 


In order to test the predictions, tests were 
assembled that could distinguish between pa- 
tients and nonpatients matched in pairs by 
age, sex, and educational level. Differences 
in performance of patients and nonpatients 
would then be attributable to other variables. 
Psychological tests closely related to percep- 
tual functions and amenable to quantitative 
treatment were selected, and critical points 
for good and poor performance were deter- 
mined for each test, so as to differentiate be- 
tween the patient and nonpatient samples. 
Each subject in the patient group was tested 
within five days after he entered the hospital. 
He was given an orientation questionnaire to 
determine his willingness and ability to at- 
tend to the testing situation. The question- 
naire was followed by the Bender Gestalt, the 
Holtzman Form Perception Test, and the 
Judgment of Lifted Weights, in the order 
mentioned. The same order of tests was main- 
tained throughout the experiment. Predic- 
tions of outcome (improved-unimproved) were 
made for the patient group on the combined 
basis of acute or chronic status and good or 
poor test performance. Judgments by physi- 
cians, who were unaware of the conditions 
and results of the study, of improved or un- 
improved status of patients were recorded at 
the end of a 90-day hospitalization period. 


Subjects 


The patient sample in this study was drawn 
from lists of new admissions to the Austin 
State Hospital. Patients with apparent and 
obvious organic defects, those mentally defec- 
tive, those over 50 years of age, and alcoholics 
and drug addicts were excluded. 


— 


/ 





388 


The sample included 20 men ranging in age 
from 16 to 49 and 22 women from 17 to 50. 
In educational level the men ranged from 
fourth grade to college graduation and the 
women from sixth grade to college gradua- 
tion. At the time of testing the patients had 
not yet been diagnosed by the staff. Sub- 
sequently 20 patients were diagnosed as 
schizophrenic reaction, paranoid type; 2 as 
schizophrenic reaction, catatonic type; 6 
as schizophrenic reaction, mixed type; 2 as 
schizophrenic reaction, chronic undifferenti- 
ated; 3 as schizophrenic reaction, acute un- 
differentiated; 3 as schizophrenic reaction, 
hebephrenic type; 1 as manic-depressive, 
depressed; 2 as schizoid personality; 1 as 
schizo-affective disorder; 1 as depressive re- 
action; and 1 as sociopathic personality. The 
total sample consisted of 42 subjects, 2 of 
whom were not matched to the nonpatient 
sample and were not included in the patient- 
nonpatient comparison but were included in 
the prediction study. 

The nonpatient or control sample num- 
bered 40, 20 men and 20 women, and con- 
sisted of applicants for jobs, and employees 
at the Austin State Hospital. The applicants 
sought employment as attendants, laboratory 
technicians, nurses, typists, secretaries, ac- 
countants, etc. In an effort to select a non- 
patient sample of comparatively “normal” 
status, applicants for jobs and employees 
were also given the MMPI. All with T scores 
above 70 in any one of the scales, including 
the validity scales, were excluded. Although 
the presence of one T score above 70 does not 
necessarily imply that the record is distinctly 
pathological, this criterion was used in order 
to achieve a more rigorous standard for se- 
lection of a “normal” sample. 


Individual Tests: Selection and Description 


An effort was made to select tests compara- 
tively limited in complexity and reflecting a 
preponderance of perceptual processes. The 
tests chosen were selected as representative of 
visual and kinesthetic areas, because many of 
the perceptual and conceptual distortions of 
psychotic patients are directly concerned with 
these areas. Three tests yielding five scores 
were chosen. Visual perception was repre- 
sented by the Bender Gestalt test and Holtz- 





Sylvia L. Sonder 


man Form Perception test. Kinesthetic per- 
ception and judgment was tested by means of 
judgments of lifted weights. 

Bender Gestalt Test. The standard method 
of administration was followed and the test, 
scored according to the Pascal-Suttell method 
(10), yielded one score, the z score. 

Holtzman Form Perception Test. This test 
was originally designed to measure the degree 
of perceptual blocking induced by color dis- 
traction. Details concerning construction, ad- 
ministration, and its application to the study 
of personality variables were reported by 
Holtzman (7). It yields a quantitative ap- 
proach to specific aspects of visual perception. 
The test 15 cards, 
7 of which are composed of incomplete black- 
on-white line and 7 have bright 
blotches and patterns su- 
the original black-on-white 
The fifteenth card, black-on-white, 
is a trial card and was the first card presented. 
The test yielded three scores: median reaction 
time score in seconds to the black and white 
cards (BW score); median reaction time score 
in seconds to the colored cards (C score); 
and a content score (Cz score) introduced in 
this study. The maximum viewing time for 
each card was 30 seconds. The higher the re- 
action time scores and the lower the content 
score, the more deviant the performance. 

Judgment of Lifted Weights. The use of 
psychophysical measures in clinical areas has 
been rather limited. The work of Helson and 
Kaplan (6) and Teuber, Bender, and Bat- 
tersby (11) has drawn attention to the mean- 
ingful cont 


as used here consisted of 


drawings, 
S| 

transparent color 

perimposed on 


drawings. 


ibutions accruing from the clinical 
application of such measures. Judgment of 
lifted weights was included because it reflects 
flexibility or rigidity of adjustment in kines- 
thetic areas, and because it involves the abil- 
ity to make discriminatory judgments in these 
areas. The method of absolute judgment (6) 

mployed, wherein each series weight is 
judged in terms of an 11-step qualitative cate- 
gory scale ranging from extremely heavy 
through medium to extremely light. A new 
method of scoring, based on deviations from 
the normal, was introduced in this study. 
Deviations were weighted in accordance with 
their pathological significance. The weights 





Perceptual Tests as Predictors of Improvement 


were summed and yielded a score called the 
g score. 


Criteria 
Acute-Chronic Status 
There are many interpretations of acute- 
ness and chronicity of disease processes, of 
which duration of illness and exacerbation of 
symptoms are most commonly employed. Of 
the two, duration of illness is the more diffi- 
cult to assess. Most personality theorists will 
agree that the foundations of personality mal- 
adjustment are laid in early childhood. If 
this is true, every adult individual who be- 
comes psychotic can be considered chronic 
and the duration of illness would be meas- 
ured by his age, or his temporal distance from 
early childhood. On the other hand, one may 
attempt to evaluate acuteness and chronicity 
on the basis of other criteria such as exacerba- 
tion of symptoms or excessive difficulty in 
managing personal affairs and dealing ade- 
quately with the environment. It is extremely 
difficult to ascertain, however, the exact time 
at which exacerbation of symptoms or man- 
agement failure occurs. Therefore, in this 
study an effort was made to define the terms 
acute and chronic objectively and operation- 
ally. Fatients hospitalized for the first time in 
a mental hospital were for the purposes of 
the study called acute. Patients who had had 
previous periods of hospitalization were called 
chronic. This definition assumes that chro- 
nicity is a function of duration of illness and 
that a longer period of time has elapsed since 
the initial evidence of symptoms (hospitali- 
zation) in the chronic patients than in the 
acute patients. 


Improved-Unimproved Status 


Patients who were judged by the physicians 
at the end of a 90-day period of hospitaliza- 
tion as able to manage their own affairs and 
withstand a reasonable amount of environ- 
mental stress were regarded as improved. A 
90-day period was selected because most of 
the commitments to the Austin State Hos- 
pital are made on a 90-day basis. Those who 
showed no such capability and those whose 
adjustment was limited to a superficial ac- 
ceptance of hospital routine were regarded as 
unimproved. 


389 
Five categories of improved-unimproved 
status were set up. They were: 1. unim- 
proved; 2. improved with poor prognosis; 3. 
improved with fair to poor prognosis; 
proved with fair to good prognosis; and 5. 
improved with good prognosis. In these judg- 
ment categories, prognosis refers to the de- 


1 


gree of confidence expressed by the physi- 
cians in the patient’s ability to maintain his 
improvement over an extensive period of time. 
For example, improved with poor prognosis 
means that although the 


patient appears to 
I I 


have improved, his improved status is so un- 
reliable that he will be 


expected to return to 
the hospital in a fairly short time. Improved 
with fair to poor prognosis means that the pa- 


tient’s improved status is tenuous and unl 
and cu 
todial, he will be expected to return to the 
hospital shortly. Improved with fair to good 
prognosis and improved with good prognosi 
mean that the patient will be expected to 
maintain his improved status and will be able 
to deal with his own affairs and environme: 


his home environment is “protected 


tal stresses in such a manner that further hos- 
pitalization will not be required, Categories 
1, 2, and 3 are considered Unimproved. Cate- 
gories 4 and 5 are considered Improved. 


Poor Test Performance-Good Test Perform- 
ance 


The critical points for pass or fail for each 
of the five were determined by the 
graphic method described by Guilford. The 
critical number of fail or poor score areas out 


tests 


of the total of five that differentiated the pa 
tient and nonpatient groups was determined 
by the same technique. The critical 
were required not only to distinguish between 


the two groups, but also to be so placed as to 


include as few members of the nonpatient 
(control) group in the deviant area as pos- 
sible. The critical point was found to be 
three poor-score areas. With three or more 
poor-score areas the test performance was 
designated as poor; with zero through two 
poor-score areas, the test performance 
called good. 


Results 
The five test variables distinguished clearly 
between the patient and nonpatient groups, 





390 


Table 1 


Means, Standard Deviations, Critical Points, and 
t Values of Scores in the 5 Tests of the 
Patient and Nonpatient Groups 














Patients Nonpatients 
— —— Critical 
Scores Mean SD Mean SD point t 
Z 88.2 22.3 66.3 9.6 79.0 1.25" 
g 97 7.0 49 2.9 7.0 6.60** 
BW ao 24 1.7 0.7 2.0 5.83** 
C 15.0 9.1 7.6 58 7.0 6.71** 
Ct 17.2 48 21.8 3.3 20.0 8.78** 





** Significant beyond the .01 level. 


as can be seen in Table 1. In this table the z 
score represents the Bender Gestalt score, the 
g score the Judgment of Lifted Weights, the 
BW, C, and Ct scores the black and white re- 
action time, the color reaction time and con- 
tent scores of the Holtzman Form Perception 
test, respectively. The ¢ values presented in 
Table 1 were significant beyond the .01 level. 
Critical points in each of the five test vari- 
ables distinguishing between good and poor 
performance on each of the tests were effi- 
cient, as indicated by chi-square analysis. 
The distribution of good and poor scores in 
both groups of subjects above or below the 
critical points departed significantly from 
chance expectancy as illustrated in Table 2. 
The Yates correction for continuity was ap- 
plied in each case where the expected fre- 
quency fell below 10. 

Table 3 presents the percentage distribu- 
tion of patients and nonpatients categorized 
according to good and poor performance, 
based on the critical points in the five tests. 
The critical points appear to separate the pa- 





Sylvia L. Sonder 


Table 2 


Number of Patients and Nonpatients Above (Poor 
Performance) or Below (Gocd Performance) 
the Critical Points in Each of the 5 Tests 








Numberof Number of 








patients nonpatients 
Crit. Poor Good Poor Good Chi 
Test point perf. perf. perf. perf. square p 
z 79 26 «14 1 39 34.9 001 
g 7 =” 6 34 18.9 001 
BW 2 2s: %§ 6 34 18.9 .001 
C 7 28 «12 i. 2 6.1 .02 
Ct 20 24 «16 | ae 8.5 01 





tients and nonpatients efficiently with the ex- 
ception of the critical point for the C score. 
The results of the chi-square analysis pre- 
sented in Table 4 justify the use of three or 
more poor areas as a basis for designating the 
total test performance as poor. The chi square 
(Table 4) of 36.08, significant well beyond 
the .01 level, indicates that the hypothesis 
that such a distribution of patients and non- 
patients could have occurred by chance is 
untenable. The patient and nonpatient groups 
are clearly distinguished by the critical point 
of three poor-score areas out of five, since 
only 5% of the nonpatient group fall into 
the more deviant or pathological area, whereas 
70% of the patient group do. 

Based on quality of test performance and 
acuteness or chronicity of illness, predictions 
of improvement or nonimprovement were 
evaluated against physicians’ judgments of 
outcome of illness at the end of 90 days. For 
the acute group the hypothesis of a direct re- 
lationship between quality of test perform- 


Table 3 


Percentage Distribution of Patient and Nonpatient Groups on the Basis of Good and 


Poor Performance in the 5 Tests 














Form perception 














Bender Lifted ~- 
Gestalt Weights BW C Ct 
Item Poor Good Poor Good Poor Good Poor Good Poor Good 
Critical score 79 7 2 7 20 
Patients 65.0 35.0 62.5 37.5 62.5 37.5 70.0 30.0 60.0 40.0 
Nonpatients 2.5 97.5 15.0 85.0 15.0 85.0 42.5 57.5 27.5 72.5 








Perceptual Tests as Predictors of Improvement 


Table 4 


Distribution of Good and Poor Performers in the 
Patient and Nonpatient Groups Based on 
3 or More Poor Scores Out of 5 








Less than 3 poor 
scores 


3 or more poor 
scores 


Subjects (Good performance) 





(Poor performance) 











Patients 12 (30%) 28 (70%) 
Nonpatients 38 (95%) 2 (5%) 
Chi square 36.08 ~p<.001 








ance and outcome of illness was not supported 
at a significant level of confidence as shown 
by the chi square of 3.67 and the p .06. For 
the chronic group the hypothesis of an in- 
verse relationship between quality of test per- 
formance and outcome of illness was sup- 
ported at a significant level as indicated by 
the chi square of 8.38 and the p < .01. The 
frequency distributions based upon quality of 
test performance, acute status, chronic status, 
and improved-unimproved status are pre- 
sented in Table 5. 

In order to explore the possibility that 
any one of the five scores taken individu- 
ally, could predict outcome (improved-unim- 
proved) as well as all the five tests taken to- 
gether, the predictive efficiency of each test 
was investigated. Although the BW, C, and 
Ct scores of the Holtzman Form Perception 
test, taken individually, predict improvement 


Table 5 


Distribution of Frequencies of Acute Patients—Good 


Performance and Poor Performance; Chronic Pa- 
tients—Poor Performance and Good Perform- 
ance; Their Improved-Unimproved Status 
(On 5 tests) 




















Im- Unim- 
Patients proved proved Total 
Acute with good performance 3 1 4 
Acute with poor performance 3 17 20 
Chi square 3.67 p<.06 
Chronic with poor performance 9 0 9 
Chronic with good performance 2 7 9 
Chi square 8.38 p<.01 





391 


or nonimprovement for the chronic patients 
at a significantly high level of confidence, the 
probability of correctness using the BW, C, 
or Cé scores taken individually, is less than 
the probability of correct predictions using 
five tests. 


Summary 


This study tested the hypothesis that good 
perceptual test performance is negatively re- 
lated to outcome of illness (in a 90-day hos- 
pitalization period) in chronic patients and 
positively related to outcome in acute pa- 
tients. Three perceptual tests, yielding five 
scores, served as a basis for evaluating quality 
of test performance: the Bender Gestalt, the 
Holtzman Form Perception, and the Judg- 
ment of Lifted Weights. To determine the 
critical points by which to evaluate good or 
poor test performance, a nonpatient group of 
20 men and 20 women was matched by age, 
sex, and educational level to the patient group 
for evaluation of quality of performance. The 
total number of subjects used for this evalua- 
tion was 80; for prediction of outcome of ill- 
ness for the patient group, 42. Critical points 
were determined which differentiated between 
good and poor test performance of the two 
groups both in the individual tests and in the 
entire test battery beyond the .01 
significance. 


level of 


Acuteness and chronicity of illness was de- 
termined by occurrence or nonoccurrence of 
previous hospitalizations and predictions of 
outcome for individual patients were based 
on quality of test performance and acuteness- 
chronicity status. Improved and nonimproved 
status was evaluated by physicians’ judg- 
ments. Chi-square analysis revealed that the 
accuracy of prediction of outcome based on 
acuteness-chronicity and good performance- 
poor performance was significant for the 
chronic patients beyond the .01 level of con- 
fidence, but not acceptably significant for the 
acute patients. 


Received January 27, 1955. 


References 


1. Carp, A. MMPI performance and insulin shock 
therapy. J. abnorm. soc. Psychol. 1950, 45, 
721-726. 








392 


nm 





. Kriegman, G., & Hilgard, Josephine R. 


. Carp, A. Performance on the Wechsler Bellevue 


scale and insulin shock therapy. J. abnorm. 
soc. Psychol., 1950, 45, 127-136. 


. Columbia-Greystone Associates, Mettler, F. A. 


(Ed.). Selective partial ablation of the frontal 
cortex. New York: Hoeber, 1949. 


. Graham, Virginia T. Psychological studies of the 


hypoglycemia patients. J. Psychol., 1940, 10, 
327-358. 


. Hales, W. M., & Simon, W. MMPI patterns be- 


fore and after insulin shock therapy. Amer. 
J. Psychiat., 1948, 105, 254-258. 


. Helson, H., & Kaplan, S. A study of judgment 


in pre- and posttopectomized subjects. Unpub- 
lished manuscript, 1950. 


. Holtzman, W. H. Emotional instability and per- 


ceptual blocking induced by color distraction 
Paper read at Southern Soc. Phil. and Psy- 
chol., Austin, April, 1953. 


Intelli- 


gence level and psychotherapy with problem 


Sylvia L. 


Sonder 


— 
+ 


. Malamud, W., & Render, N. 


. Pascal, G. R., 


. Teuber, H. L., 


3. Zubin, 


children. Amer. 


503-511. 


J. Orthopsychiat., 1941, 11, 


Course and prog- 
nosis in schizophrenia. Amer. J. Psychiat., 
1939, 95, 1039-1057. 
& Suttell, Barbara. The Bender 

Gestalt test: quantification and validity f 

New York: Grune & Stratton, 1951 
Bender, M., & Battersby, W. S. 
Federation Proc., 1950, 9, 125. 


adults 


. Windle, C., & Hamwi, Violet. An exploratory 
study of the prognostic value of the Com- 
plex Reaction Time test in early and chronic 
patients. J. clin. Psychol., 1953, 9, 156-161. 


J., & Windle, C. The prognostic value of 


the Metenym test in a follow up study of 
psychosurgery patients and their controls. J. 
clin. Psychol., 1951, 7, 221-223. 

Zubin, J., Windle, C., & Hamwi, Violet. Retro- 
spective evaluation of psychological tests as 
prognostic instruments in mental disorders. J. 
Pers., 1953, 21, 343-355. 


Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


The Relation of the Trail Making Test to 
Organic Brain Damage’ 


Ralph M. Reitan * 


Department of Surgery (Section of Neurological Surgery), Indiana University 
Me dit al Center 


The need for short, easily administered 
tests specifically sensitive to organic brain 
damage is indicated by the considerable num- 
ber of such tests published in recent years. In 
1946, Armitage (1) presented results obtained 
with the Trail Making Test suggesting that it 
was highly efficient in differentiating patients 
with and without brain damage. This paper 
presents the results of a second attempt to 
determine the validity of the test for this 
purpose. 


Procedure 


The Trail Making Test was administered 
individually to 27 patients with brain dam- 
age and 27 patients without brain damage. 
The diagnostic distribution of the brain— 
damaged patients included: brain tumor, 9; 
penetrating head injury, 6; closed head in- 
jury, 3; cerebral vascular accident, 2; cere- 
bral abscess, 2; cerebral atrophy, 1; sub- 
dural hematoma, 1; temporal lobectomy for 
epilepsy, 1; dementia paralytica, 1; and con- 
genital anomaly of the brain, 1. Testing of 
the brain-damaged patients was delayed until 
maximal benefits of hospitalization had been 
obtained, and the patients were ready for dis- 
charge. All control patients were hospitalized 
also at the time of testing. The control group 
included patients with paraplegia, 13; neu- 
rosis, 6; congenital heart disease, 1; and sur- 
gery not involving the brain, 7. The patients 
were matched in pairs on the basis of sex, 
color, chronological age, and years of educa- 
tion without reference to the test results. 


1 Supported in part by a research grant from the 
James Whitcomb Riley Memorial Association. 

2The author is indebted to Miss Elaine Tarshes 
for assistance with test administration and the sta- 
tistical analysis. 


Each group of 27 subjects was composed of 
85% males and 15% females, and 93% white 
and 7% colored persons. Means for age and 
education in the brain-damaged group were 
30.22 (range 17-62) and 10.63 (range, 4 
16), respectively. Comparable values for the 
group without brain damage were 29.67 
(range, 16-60) and 10.78 (range, 4-16). 

The Trail Making Test is one of the per- 
formance subtests of the Army Individual 
Test (2). It is divided into two parts, each 
consisting of one page. Part A consists of 25 
circles distributed over the entire page and 
numbered from 1 to 25. The subject is re- 
quired to connect the circles with a pencil 
line as quickly as possible, beginning with 1 
and proceeding in numerical sequence. Part 
B consists of 25 circles, numbered 1 to 13 
and lettered from A to L. The subject is re- 
quired to connect the circles, but alternating 
between numbers and letters and taking both 
series in ascending sequence. For example, 
the subject draws a line from 1 to A to 2 to 
B, etc. The score for the test was the num- 
ber of seconds required for completion of 
each part. Administration procedure deviated 
from that recommended in the Manual in 
two respects: (a) When an error was made, 
the examiner pointed it out immediately and 
requested the subject to make the correction, 
and (5) the test was not discontinued prior 
to completion regardless of how difficult it 
was for the subject or the number of errors 
made. Errors contributed to the score only 
insofar as additional 
corrections. 

The raw scores (in seconds) for each part 
of the test were converted according to a ten- 
point scale given in the Manual, with the 
best performances receiving ten points. Sta- 


time was needed for 


393 





394 


tistical analyses were based on the converted 
scores. 


Results 


Table 1 presents frequency distributions of 
the sum of scores obtained on Parts A and B 
for each group. 

The mean difference of 6.26 points was 
highly significant. A ¢ ratio of 5.86 was 
obtained, calculated with the method for 
equated groups. The frequency distributions 
show the discriminating value of the Trail 
Making Test for our groups. If scores of 12 
or below were selected as suggestive of brain 
damage, five members of the control group 
and four members of the brain-damaged 
group were misclassified. This amounts to 
approximately 17% of each group. Obviously, 
this selection of a cut-off score capitalizes on 
the chance characteristics of our particular 
distributions, and should not be used as a 
basis for clinical conclusions regarding pa- 
tients from other samples. 

Frequency distributions for Parts A and B 
individually are not presented, but ¢ ratios 


Table 1 


Distribution of Trail Making Scores for Parts A 
Plus B for a Group With and a Group 
Without Brain Damage 














Brain- 
Control Damaged 
Scores Group Group 

20 3 
19 1 1 
18 1 0 
17 3 0 
16 + 0 
15 3 0 
14 4 0 
13 3 3 
12 0 2 
11 1 3 
10 1 3 
9 0 2 
8 3 3 
7 1 
6 1 
5 2 
+ 3 
3 0 
2 3 
N 27 27 
Mean 14.70 8.44 

p< 001 











Ralph M. Reitan 


comparing the groups were 4.68 and 5.33, re- 


' spectively. Each of these ratios has a chance 


probability of less than .001. 


Discussion 


The results substantiate Armitage’s (1) 
finding that the Trail Making Test signifi- 
cantly differentiates brain-damaged and non- 
brain-damaged groups. The ¢ ratios of this 
study do not achieve the magnitude of those 
reported by Armitage, but this may be a re- 
flection of our deliberate inclusion of a large 
proportion of chronically ill patients in our 
control group. 

The size of our groups is not sufficiently 
large to provide norms for assessing the per- 
formance of individual subjects in clinical set- 
tings. However, the results obtained warrant 
further testing aimed toward establishing such 
norms, particularly in consideration of the 
Trail Making Test’s advantages of brevity, 
ease in administration, and moderate cost. 


Summary 


The Trail Making Test was administered 
to groups (NV = 27) of brain-damaged and 
hospitalized control subjects who were closely 
matched in pairs on the basis of sex, color, 
age, and education. This test is one of the 
performance subtests of the Army Individual 
Test, and consists of two parts. Part A re- 
quires the examinee to connect with a pencil 
line 25 numbered circles printed on a sheet 
of paper, proceeding in order from 1 to 25. 
Part B consists of 13 numbered and 12 let- 
tered circles, and the subject is instructed to 
alternate between numbers and letters, going 
from 1 to A to 2 to B, etc. The tests were 
scored as the number of seconds required 
for completion. Highly significant intergroup 
mean differences were found (p < .001) on 
both parts as well as their sum. The results 
confirm Armitage’s findings that this short, 
inexpensive, and easily administered test may 
be a fairly valid indicator of certain effects 
of brain damage. 


Received February 23, 1955. 


References 


1. Armitage, S. G. An analysis of certain psycho- 
logical tests used for the evaluation of brain 
injury. Psychol. Monogr., 1946, 60, No. 1 
(Whole No. 277). 

2. Manual: Army Individual Test. War Department, 

The Adjutant General’s Office, 1944. 








Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


Goal-Setting Rigidity in an Ambiguous Situation” 


Seymour L. Zelen * 
Patton State Hospital 


Inherent in most conventional techniques 
for measuring level of aspiration is the op- 
portunity for the subject to make numerous 
shifts of goal level. The changing of goals in 
the light of new experience may serve as an 
operational measure of rigidity-flexibility. In 
the standard level-of-aspiration situation, the 
individual is presumed to be oriented con- 
tinuously in terms of his previous perform- 
ances. When a subject has a fairly stable 
performance with little reason to shift, the 
simple recording of his number of shifts of 
goal might seem to indicate a lack of flexi- 
bility, when actually he is responding real- 
istically. The conventional type of measure- 
ment, therefore, limits validity and imposes 
a needless handicap on experimenters who 
wish to examine rigidity. In order to provide 
more psychologically meaningful data, some 
modification of the level-of-aspiration experi- 
ment seems advisable. 

The problems of this study were: (a) to 
test the hypothesis that an unstable frame of 
reference better measures rigidity than a 
stable frame of reference, and (5) to deter- 
mine the validity and reliability of rigidity 
scores obtained by this method for both 
adults and children. By an unstable frame of 
reference is meant an ambiguous situation in 
which the subject cannot anchor his goals in 
terms of previous performance. 


1 This research was supported by Research Grant 
MH-301 from the National Institute of Mental 
Health of the National Institutes of Health, U. S. 
Public Health Service, and was part of the Preven- 
tive Psychiatry Project, Child Welfare Research Sta- 
tion, State University of Iowa. 

2 Read at the May 1954 meeting of the Western 
Psychological Association. 

8 The writer is particularly indebted to Dr. Eugene 
E. Levitt for both initial collaboration and continued 
interest. 


Method 
Nature of the Group Level-of-Aspiration Task 


The specific tasks were adapted from the 
digit-symbol test of the Wechsler-Bellevue 
Scale, Forms I and II. Alternate Forms A and 
B of the task consisted of a series of separate 
trials, each composed of 15 digit-symbol 
blanks, administered as a digit-symbol test 
with additional instructions: 


1. This is one of many types of intelligence 
tests. 

2. Before work is started on each trial, a 
written estimate of how many spaces will be 
completed correctly has to be made. 

3. The ultimate score depends on how cor- 
rectly the estimate is made. There is no credit 
beyond the estimate, and penalties for falling 
below estimates are imposed. 


Involvement in the task was secured by de- 
scribing the situation as an intelligence test, 
so that self-esteem of the subjects would 
suffer if they failed to reach their stated 
goals. 


Treatment Conditions 


The procedure modified in this experiment 
was the time per trial. No time limit was 
stated in the instructions. In the Random 
Time Condition, the time per trial was varied 
according to a prearranged pattern, and no 
subject knew in advance what the time for 
the succeeding trial would be. Under a second 
condition it was announced that there would 
be a Set Time for each trial, and under a 
third, the subjects were informed that there 
would be progressive increments of time from 
trial to trial, or an Jncreasing Time Condition. 

The Random Time Condition of the test 
was administered with all of the trials on one 


395 





396 


page so that the subject had all his previous 
experience with the test immediately avail- 
able. The Set Time and Increasing Time Con- 
ditions were given with the digit-symbol trials 
on alternate pages of a test booklet. 


Measures 


Measures derived from the group level-of- 
aspiration task. Since this experiment was 
directly concerned with rigidity, the primary 
score derived from the group level-of-aspira- 
tion task was the number of shifts of stated 
goal or estimate made during the course of 
the experimental period. A rigidity index was 
also developed which consisted of an additive 
combination of the total number of shifts 
with the total amount shifted on all the trials. 
The combined rigidity measure was used be- 
cause it is not merely the number of times an 
individual shifts that is a measure of his ri- 
gidity, but also the amount or extert to which 
he is willing to shift. Thus one individual may 
make many shifts, each of a very small de- 
gree, whereas another subject may make the 
same number of shifts with each shift greater. 
These two individuals would not be equally 
flexible despite an equal number of shifts. In 
addition, this rigidity index yielded scores 
which distributed themselves more normally 
than simple number of shifts, an advantage 
in any psychological instrument. The goal- 
discrepancy score was another measure used 
in this investigation, but it was utilized for 
the evaluation of the total goal-setting situa- 
tion, i.e., of the technique as a whole, rather 
than as an additional measure of rigidity. 

Measures derived from the Rotter Level of 
Aspiration Board (3). As for the group level- 
of-aspiration task, number of shifts of goal 
was the measure of rigidity, and the goal-dis- 
crepancy score was the measure derived to 
measure level of aspiration or goal-setting be- 
havior on the Rotter board. 

The California Ethnocentrism Scale, Form 
78 is a 14-item questionnaire designed to 
measure the tendency to accept “the cultur- 
ally alike” and reject the “unlike” (1). 

The California Authoritarianism Scale, Form 
78, also referred to as the F or Fascism Scale 
(1), is a 38-item questionnaire designed to 
measure acceptance of authoritarian attitudes 
and standards. 





Seymour L. Zelen 


The Children’s Authoritarianism Scale. As 
a corollary to the F Scale, this test was used 
with children. It is a 24-item scale, designed 
by Gough e¢ al. (2), to measure antidemo- 
cratic and authoritarian attitudes. 

The Short Form of the Wesley Rigidity 
Scale. This is a 13-item questionnaire ab- 
breviation of the R Scale, consisting of 
MMPI-like items, which has been found to 
be a discriminative measure of rigidity (5, 7). 


Procedure 
Comparison of the Frames of Reference 


It was predicted that the technique making 
use of the unstable frame of reference would 
provide the most valid measure of rigidity, 
i.e., would best discriminate between subjects 
considered highly rigid in terms of a criterion 
measure, such as the Short Form of the Wes- 
ley Rigidity Scale (7). In addition to this 
form of concurrent validity, the technique 
which permitted the maximum tendency to 
rigidity to appear would also correlate most 
highly with measures of ethnocentrism and 
authoritarianism, such as the California E 
and F Scales (1). 

To test these hypotheses three groups of 
subjects were given the group level-of-aspira- 
tion task. Each group was tested under a dif- 
ferent time condition, but all were adminis- 
tered the Wesley Rigidity Scale, and the Cali- 
fornia Ethnocentrism and Authoritarianism 
Scales. Correlations between scores from the 
different time conditions and each of the 
scales were obtained. 


Standardization of the Random Time Form 


Correlation with the Rotter board. Another 
validation of the group level-of-aspiration 
task, both as a measure of rigidity and as an 
adequate measure of goal-setting behavior, 
would be a substantial positive correlation 
with a standard individual technique, such as 
the Rotter board (3). 

Correlation with the Children’s Authori- 
tarianism Scale. Correlations between the A 
Scale and the technique, closely similar to 
those obtained with the F Scale, would be 
further evidence of its validity. 

Reliability of the Random Time form. A\l- 
ternate forms of the group level-of-aspiration 











Goal-Setting Rigidity in 


test were developed in an attempt to obtain 
some measure of reliability. Form A and Form 
B were administered in a counterbalanced or- 
der, and a test of the significance of the differ- 
ences between the two supposedly equivalent 
forms was performed. A split-half, Kuder- 
Richardson test of reliability was also per- 
formed using Form A. Finally, a ¢ test was 
performed to test the significance of the dif- 
ference hetween means of the first adminis- 
tration of both forms. 


Subjects 


Three groups of subjects were used in the 
comparison of the different frames of refer- 
ence, one for each time condition. All of the 
subjects used in this phase of the study were 
taken from undergraduate courses at the 
State University of Iowa. Seventy-four sub- 
jects were tested using the Set Time Condi- 
tion, 78 under the Increasing Time Condition, 
and 47 under the Random Time Condition. 
All subjects were given the Wesley Rigidity 
Scale and the California Ethnocentrism and 
Authoritarianism Scales. 

In the standardization of the random time 
technique, 84 sixth-grade children were used 
to correlate Form A with the Children’s Au- 
thoritarianism Scale. Seventeen adults were 
administered both Form A and the Rotter 
board. 

Both children and adults were employed as 
subjects in the development and evaluation 
of alternate forms. Seventy-four fifth- and 
sixth-grade children and 44 undergraduates at 
the State University of Iowa were adminis- 
tered both Form A and Form B of the level- 
of-aspiration test in a counterbalanced order. 
Test protocols of 91 adults who had been 
given Form A were used to obtain the Kuder- 
Richardson reliability. 


Results 
Treatment Conditions: Frames of Reference 


When the number of shifts made under the 
different treatment conditions were compared, 
there was little difference between them. The 
means were 6.01, 6.82, and 6.48 for the Set, 
Increasing, and Random Time Conditions, re- 
spectively. The standard deviations were 2.28, 


an Ambiguous Situation 397 
2.30 and 2.13, respectively, for the condi- 
tions. 

The correlation between the number of 
shifts yielded by the Set Time and Increas- 
ing Time Conditions with the Z and F Scales 
and with the Short Form of the Wesley R 
Scale proved to be insignificant, as shown in 
Table 1. 


Table 1 


Relation of Treatment Conditions to 
Criterion Measures 


Condition E scale F scale R scale 
Set time —().23 0.12 —(, 1] 
Increasingtime —0.10 0.19 -0.17 
Random time 0.31* 0.29* 0.44** 
Multiple Reger 

random time +-0.48** 


Multiple Rr.z 


random time () 46** 


* Significant at the 
** Significant at the 


05 level 
01 level. 


Under the Random Time form significant 
correlations were found with both the F and 
E Scales as well as with the Short R Scale. 
There was a correlation of .31 between the 
number of shifts and the E Scale, and of 
— .29 between number of shifts and the F 
Scale, both significant at the 5 per cent level 
of confidence. A correlation of — .44 between 
number of shifts and the R Scale was signifi- 
cant at the .01 level. 

When the scores of both the E Scale and 
the R Scale were used to predict the number 
of shifts, a multiple-correlation coefficient of 
48 resulted. A multiple R of .46 was pro- 
vided by the scores of the F R Scales 
predicting number of shifts. 


and 


Results of Standardization Studies 

The correlation between the Random Time 
group level of aspiration and the Children’s 
Authoritarianism Scale was — .22 in the pre- 
dicted direction and significant at the .05 
level. Different measures of authoritarianism 
were used for the two age levels. With adults 
the coefficient of correlation between F and 
level of aspiration was — .29; using the Chil- 
dren’s Authoritarianism Scale, an r of — .22 
resulted. The difference between the two cor- 
relations is not significant. 





398 Seymour L. Zelen 


Comparison of the Random Time level-of- 
aspiration technique with the Rotter board 
resulted in a correlation for the measures of 
rigidity of .49, significant at the .05 level. 
For the same comparison of the D scores, the 
correlation was .59, significant at below the 
.02 level. 


Reliability of the Random Time Form 


Comparison of rigidity index scores of 
Forms A and B of the level-of-aspiration test 
yielded a Pearsonian coefficient of correlation 
of .78 for the children and .81 for the adults, 
both highly significant. A split-half, Kuder- 
Richardson reliability of .72 was obtained. 

The scores of the first administration of 
each form were tested for differences between 
means, and yielded a ¢ of .006. This resulted 
in a p greater than .9, which supported the 
null hypothesis of no difference between 
forms, as Table 2 shows. 


Table 2 


Rigidity Index Scores for Alternate Forms of the 
Ambiguous Group Level-of-Aspiration Test 











Test Difference I 
sequence Administration minus IT 
A-B Ia IIp 

Mean 14.45 17.27 —2.73 

SD 5.05 6.26 6.52 
B-A Ip IIa 

Mean 14.53 14.32 +0.21 

SD 3.52 6.27 4.31 





Rigidity index difference scores were then 
obtained between the two counterbalanced 
methods of administration, Form A minus 
Form B and Form B minus Form A, to de- 
termine the effect of initial position on learn- 
ing on the form administered second. Mean 
differences between the two administrations 
were — 2.73 and + .21. This yielded a ¢ of 
1.63, which was only significant between the 
.2 and .1 level. It was not sufficient to reject 
the null hypothesis that neither form had 
greater learning effects. An over-all analysis 
of variance also yielded no significant differ- 
ence between the means. 


Discussion 


The Set Time and Increasing Time Condi- 
tions failed both to discriminate between sub- 


jects of differing degrees of rigidity and to af- 
fect the scores in any differential manner. 
This failure can perhaps be explained by the 
fact that the entire form of the task created 
such a structured situation that there was 
only one way to respond. Both treatment con- 
ditions provided such a constant frame of ref- 
erence that any realistically functioning indi- 
vidual would be forced to view the situation 
as one in which he should get progressively 
better and should thereafter alter his score. 

The results of cross validation indicate that 
a fairly high positive relationship exists be- 
tween the rigidity measure derived from the 
group level-of-aspiration task under the Ran- 
dom Time-Ambiguous Condition and two ex- 
ternal measures of rigidity, the Short Form 
of the Wesley Rigidity Scale and the rigidity 
measures of the Rotter board. Predictive va- 
lidity was also achieved with significant posi- 
tive correlations between the measures of 
ethnocentrism and authoritarianism and the 
rigidity measure of this test. When the am- 
biguous level-of-aspiration task was used with 
children, the results tended to support the 
hypothesis of a relationship between goal- 
setting rigidity and a measure related to ri- 
gidity, such as authoritarianism. Thus, within 
the limits of the relationship between the 
adult F Scale and the Children’s Authori- 
tarianism Scale, it is possible to conclude that 
the instrument measures comparable behavior 
for adults and children. It may further be 
concluded that the group level-of-aspiration 
technique under the Random Time Condition 
is a relatively valid measure of rigidity, pre- 
dicting as much of the interrelationships of 
rigidity as current hypothesizing indicates it 
should. 

Only when sufficient ambiguity existed in 
the situation, an uncertainty as to what to 
anticipate in the succeeding trials such as oc- 
curred with the random time treatment, could 
idiosyncratic motivations be involved to a 
significant degree. In the less structured level- 
of-aspiration task, it can be postulated that 
the same need systems operating in the ri- 
gidity, authoritarianism, and ethnocentrism 
questionnaires will be available to the indi- 
vidual. This study presents evidence that am- 
biguity, in terms of frame of reference for 


ee eee 





— 








ae 





Goal-Setting Rigidity in an Ambiguous Situation 399 


estimate, serves to maximize personal tend- 
encies to flexibility or rigidity. 

Since a great deal of learning takes place 
in the digit-symbol task, it is not possible to 
readminister the same form within a rela- 
tively short interval and obtain comparable 
results. But the high Kuder-Richardson re- 
liability and the high correlation between the 
two forms indicate that a reasonably stable 
measure has been obtained, and that differ- 
ences reflected on the test are not merely the 
result of chance variations of the instrument. 


Summary 


The use of an ambiguous frame of reference 
in a level-of-aspiration situation seemed to 
provide sufficient uncertainty to maximize 
personal tendencies to rigidity or flexibility. 
Of three experimentally provided frames of 
reference, only the ambiguous frame of ref- 
erence provided scores which correlated sig- 
nificantly with a criterion measure of rigidity, 
and which had significant relationships with 
measures of ethnocentrism in aduits and au- 
thoritarianism in both adults and children. 
Differences between the two equivalent forms 
which were developed were insignificant. 
Learning on one form seemed to affect scores 
on the other about equally in terms of im- 
provement and variability. These two forms 
correlated .78 for children and .81 for adults, 
indicating that a relatively stable measure 
had been obtained. 


An index or score of rigidity was developed 
and used which was not merely a numerical 
description of the quantity of shifting, but 
also a statement of the quality or total ex- 
tent of the shifts. 

It may be concluded that an approach to 
rigidity in goal-setting situations can more 
profitably employ an ambiguous rather than 
a stable frame of reference. 


Received January 26, 1955. 


References 


1. Adorno, T. W., Frenkel-Brunswick, Else, Levin- 
son, D. J., & Sanford, R. N. The authoritarian 
personality. New York: Harper, 1950. 

2. Gough, H. G., Harris, D. B., Martin, W. E., & 
Edwards, M. Children’s ethnic attitudes: I. 
Relationship to certain personality variables 
Child Develpm., 1950, 21, 83-91. 

3. Rotter, J. B. Level of aspiration as a method of 
studying personality: III. Group validity 
studies. Charact. & Pers., 1943, 11, 255-274. 

4. Rotter, J. B. Level of aspiration as a method of 
studying personality: IV. The analysis of pat- 
terns of response. J. soc. Psychol., 1945, 21, 
159-177. 

5. Wesley, Elizabeth B. Perseverative behavior in a 
concept formation task as a function of mani- 
fest anxiety and rigidity and of punishment. 
Unpublished doctor’s dissertation, State Uni- 
ver. of Iowa, 1950. 

6. Zelen, S. L., & Levitt, E. E. A group level of as- 
piration test as a measure of personality ri- 
gidity. Proc. Ia. Acad. Sci., 1953, 60, 569-573. 

7. Zelen, S. L., & Levitt, E. E. Notes on the “Wes- 
ley Rigidity Scale’: The development of a 
short form. J. abnorm. soc. Psychol., 1954, 
49, 472-474. 








Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 





A Comparison of Raven's Progressive Matrices (1938) 
with the ACE Psychological Examination and the 
Otis Gamma Mental Ability Test’ 


Byron J. Bolin 


Central State Hospital, Lakeland, Ky 


The range of applicability of Raven’s Pro- 
gressive Matrices (PM) is not yet fully un- 
derstood. A major question is what functions 
the test samples. Another problem is the de- 
gree of equivalence between the PM and 
standard scales of general intelligence. Com- 
parison of PM with tests consisting of sec- 
tions of homogeneous, well-understood items 
can contribute to understanding of the first 
question. Examination of correlations between 
PM and various “general intelligence’’ scales 
can help to solve the second problem. The 
widely used American Council on Education 
Psychological Examination for College Fresh- 
men (ACE) is made up of sections of seem- 
ingly homogeneous items sampling rather spe- 
cific functions. The Otis Gamma Mental 
Ability Test is popular as a group test of 
general intelligence. 

Seventy-six junior-college students were 
given on separate occasions and under simi- 
lar group conditions the ACE, the Otis 
Gamma, Form D, and the PM. Subjects’ 
mean age was 18.7 years, and there were 72 
females and 4 males. The central tendency 
data: PM, M 48.15, SD 7.12; Otis, M 47.82, 
SD 10.72; ACE(T) (total) score, M 87.26, 
SD 27.88; ACE(L) (“linguistic”), M 53.49, 
SD 17.56; ACE(Q) (“quantitative”), M 
33.65, SD 12.06. The r’s: PM with Otis, .65; 


1An extended report of this study may be ob- 
tained without charge from Dr. Byron J. Bolin, 


Central Hospital, Lakeland, Kentucky, or for a fee 
from the American Documentation Institute. Order 
Document No. 4624 from ADI Auxiliary Publica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 





400 


PM with ACE(T), 48; PM with ACE(L), 
.29; and PM with ACE(Q), .59. 

The low PM-ACE(L) r tends to support 
the prevailing opinion that PM best meas- 
ures nonlinguistic areas of intelligence. The 
ACE(Q) tests are Arithmetical Reasoning, 
Number Series, and Figure Analogies. These 
sections heavily involve computation, recog- 
nition of complex spatial and numerical rela- 
tionships, and of course the capacity for sus- 
tained concentrated effort. These abilities are 
generally conceded to be especially vulner- 
able to pathology. The ACE(L) tests relate 
closely to range and availability of vocabu- 
lary, with one of the three (Verbal Analogies) 
adding more complex verbal functions to the 
items. Vocabulary, of course, is considered 
relatively resistant to impairment. Published 
studies show higher correlation of PM with 
Wechsler-Bellevue (WB) subtests considered 
particularly sensitive to brain damage than 
with subtests less so. Hence PM alone ap- 
pears unsuitable for assessing “original en- 
dowment” in clinical cases but possibly use- 
ful in estimating loss. 

The shape of the distribution of our under- 
graduates’ PM scores resembles the one Not- 
cutt (Brit. J. Psychol., 1949, 40, 68—70) ob- 
tained from his English subjects. His Zulu 
subjects gave a distribution skewed oppo- 
sitely. Similarly, we found excessive fre- 
quency of rural or poorly educated employ- 
ment applicants earning a subnormal PM 
and a normal WB brief-form score. Level of 
general attainment appears to reflect ap- 
preciably in the PM. 


Brief Report 
Received May 13, i955. 








aprece 


ad 











Journal of Consulting Psychology 
Vol. 19, No. 5, 1955 


The Taylor Manifest Anxiety Scale and Intelligence 


Mark S. Mayzner, Jr., Eugene Sersen, and M. E. Tresselt 


New York University 


Numerous recent studies have employed 
the Taylor scale as a measure of manifest 
anxiety with the assumption being made that 
variations in anxiety scores should reflect 
systematic variations in motivational or drive 
level. If the Taylor scale does not measure 
anxiety accurately, but is confounded with 
intelligence, doubt might be entertained con- 
cerning the validity of much of the work done 
by Spence and his group (4, 12, 13, 14, 15, 
16, 18) and others (7, 10, 11) with this test. 

Evidence against the Taylor scale is forth- 
coming from two recent articles. Both Grice 
(6) and Kerrick (8) have found significant 
negative correlations between the Taylor 
scale and intelligence. Thus Grice says “The 
possibility of a relation between anxiety 
measures and intellectual variables should be 
considered in the design of future experi- 
ments of this type” (6, p. 74). Further, “One 
rather important implication of the present 
finding is that it raises some question con- 
cerning the validity of other studies involv- 
ing groups selected in a similar manner” (6, 
p. 73). Kerrick is more positive in her re- 
marks stating that “The data reported here 
indicate that it is virtually impossible to se- 
lect extreme subjects on the Taylor scale who 
are equated in intelligence. It is suggested 
that differences in learning which have been 
attributed to anxiety alone may well be 
merely the result of differences in ability or 
IQ or may be the result of the interaction 
between IQ and anxiety” (8, p. 77). 

The indictments of Grice and Kerrick, if 
true, indeed create doubts about the find- 
ings of a great many experiments. Since the 
studies by Grice and Kerrick employed Air 
Force trainees in group testing situations, it 
was felt that additional correlations between 
anxiety and intelligence scores should be ob- 


401 


tained with college subjects, both in group 
and individual testing situations. 


Method 


Fifty-five subjects enrolled in a “How-to- 
Study” course at New York University over 
the past few years took both the full Wech- 
sler-Bellevue (W-B) (individual testing) and 
the Taylor scale (17). A different group of 
145 subjects (entering freshman at N.Y.U. 
over the past few years) took the American 
Council on Education Psychological Exami- 
nation (ACE) (group testing) and the Tay- 
lor scale. 


Results and Discussion 


Table 1 shows the results of this study and 
others which have correlated the Taylor anx- 
iety scores with intelligence scores. It can be 
seen that these correlations vary from — .40 
to + .19. In particular, the findings of this 
study yield correlations between anxiety and 
intelligence scores which are all positive but 
insignificant. The ACE and anxiety correla- 
tions were calculated separately for males 
(+ .18), for females (+ .09) and the top 
and bottom 16.5% of the anxiety scale dis- 
tribution (+ .17) to be comparable to Grice’s 
anxiety and intelligence correlation in which 
he employed only the extremes of his dis- 
tribution. Although the W-B and anxiety 
scores show the highest correlation (+ .19) 
in this study, it still is not significant. Frick 
(5) also reports results in general agreement 
with these findings. He obtained a correlation 
of — .07 (N = 267 females) between ACE 
and psychasthenia (Pt) scores from the 
Minnesota Multiphasic Personality Inventory 
(MMPI). Since Pt correlated very highly 
with the Taylor scale (Blackbill and Little 
[2] and Erickson and Davids [3] both found 





Mark S. Mayzner, Jr., Eugene Sersen, and M. E. Tresselt 





Table 1 





Correlations Between Anxiety and Various Measures of Intelligence 











Taylor 
Anxiety 
Intelligence tests Scale Subjects Experimenters 
Wechsler-Bellevue, range 93-130 IQ +.19 45 males, 10 females 
ACE (1946 ed.), range 58-173 points +.18 90 males 
+ .09 55 females Mayzner ef al. 
+.14 total 145 subjects 
+.17 24 highest anxiety and 
24 lowest anxiety 
Grade Point Average — .08 101 college sophomores 
Comprehension, Vocabulary, & Similarities of W-B — .07 101 college sophomores Matarazzo et al. 
ACE (1949 Edit.) — .25** 101 college sophomores 
Clerical Aptitude Index — 40** 60 Air Force trainees Grice 
AFQT —.20* 128 Air Force trainees Kerrick 
*p < .05. 
> < 01. 


correlations of approximately + .92 between 
the Taylor and P?# scales), it could be as- 
sumed that Frick’s correlation of — .07 would 
have been obtained approximately had he, in 
fact, used the Taylor scale. It can be noted 
that Frick’s correlation of — .07 obtained with 
female subjects only is very similar to the 
correlation of this study of + .09 using fe- 
male subjects, and less similar to our correla- 
tion of + .18 found with male subjects. 

Matarazzo (9) has obtained results in par- 
tial agreement with our findings and in partial 
agreement with the results of Grice and Ker- 
rick. He obtained small and insignificant cor- 
relations between Grade Point Averages and 
Taylor scores (— .08) and CVS (a score de- 
rived from the comprehension, vocabulary, 
and similarities scales of the W-B given in 
group administration) and Taylor scores 
(— .07) whereas the ACE and Taylor scores 
yielded a significant negative correlation of 
— .25. Matarazzo feels that his one signifi- 
cant and two insignificant correlations may 
be explained in terms of the time pressure, 
presumably acting differentially on the high- 
anxiety subjects. Thus the two insignificant 
correlations were obtained under nontimed 
test conditions, while the ACE and anxiety 
correlation was obtained under timed test 
conditions. 

The results of the present study would 
seem to indicate that the timed aspect of the 


ACE alone is not the crucial element in yield- 
ing a significantly negative correlation, since 
our findings yielded an insignificant positive 
correlation. Perhaps the determining variable 
in producing a positive or a negative correla- 
tion involves at least two general factors: (a) 
the amount of threat to the individual aroused 
by the situation and (5) the differential ef- 
fect of this threat on high- and low-anxious 
subjects with the high-anxious subjects show- 
ing the greater effect. Beier (1) has shown, 
for example, that experimentally induced 
threat produces a significant decrement in 
performances on various intellectual measures, 
and Spence, Farber, and Taylor (15) have 
more recently shown that the effect of shock 
threat acts differentially on high- and low- 
anxious subjects. Thus, on the basis of the 
results of the studies by Beier, and Spence, 
Farber and Taylor, it might be predicted 
that if Beier’s experiment were to be repeated 
with the effect of threat analyzed separately 
for high- and low-anxious subjects, the high- 
anxious subjects should react differentially 
with greater performance decrements to 
threat than the low-anxious subjects. 

If this analysis is correct, then the incon- 
sistency of the correlations shown in Table 1 
are more understandable. It is quite likely 
that Air Force trainees taking intelligence 
tests would feel threatened, since good scores 
probably would mean advancements and pro- 











ERC 


>a 


Soe tee 





motions whereas poor scores would not lead 
to such desirable results. Thus the threat 
present in the studies by Grice and Kerrick 
could produce significant negative correla- 
tions. In the present study probably no threat 
was present and the result was an insignifi- 
cant positive correlation. Matarazzo’s study 
also shows insignificant correlations for those 
conditions in which threat probably was mini- 
mum. His only significant negative correla- 
tion could be attributed to threat being pres- 
ent in the ACE testing situation which for 
unknown reasons was absent from the ACE 
testing situation at N.Y.U. 

Even if the preceding analysis is partially 
or totally incorrect, the results of Table 1 do 
not seem to warrant the implications drawn 
by Grice and Kerrick. That is, when the W-B 
IQ test shows no correlation with the Taylor 
anxiety scores and the ACE shows one sig- 
nificant and one insignificant correlation with 
Taylor anxiety scores, the evidence does not 
seem to favor the view of a significant nega- 
tive relationship being present independent of 
the testing conditions. 


Summary 


The present study obtained correlations be- 
tween the Taylor anxiety scale and both the 
W-B (individual testing) and ACE (group 
testing) for a college population. All correla- 
tions obtained were positive but insignificant. 
It was suggested that on the basis of these 
findings, the conclusions of Grice and Ker- 
rick, that a significant negative correlation 
exists between anxiety and intelligence scores, 
is at the present time limited to specific test- 
ing conditions. 


Received May 25, 1955. 
Early Publication. 
References 


1. Beier, E. G. The effect of induced anxiety on 
flexibility of intellectual functioning. Psychol. 
Monogr., 1951, 65, No. 9 (Whole No. 326). 





The Taylor Manifest Anxiety 


10. 


16 





. Hilgard, E 


. Kerrick, Jean S. Some corre 


. Mayzner, M. S., Jr., & Tresselt, M. E 


Scale and Intelligence 403 





. Brackbill, G., & Little, K. B. MMPI correlates 


of the Taylor Scale of Manifest Anxiety. J. 
consult. Psychol., 1954, 18, 433-436 


. Eriksen, C. W., & Davids, A. The meaning and 


clinical validity of the Taylor Anxiety Scale 
and the Hysteria-Psychasthenia scales from 
the MMPI. J. abnorm. soc. Psychol., 1955, 
50, 135-137. 


. Farber, I. E., & Spence, K. W. The relationship 


of anxiety level to performance in serial learn 
ing. J. exp. Psychol., 1952, 44, 61-64 

Frick, J. W. Improving the prediction of aca 
demic achievement by use of the MMPI. J 


appl. Psychol., 1955, 39, 49-52 


. Grice, G. R. Discrimination reaction time as a 


function of anxiety and intelligence. J. ab- 
norm. soc. Psychol., 1955, 50, 71-74. 

R., Jones, L. V., & Kaplan, S. J 
Conditioned discrimination as related to anx 
iety. J. exp. Psychol., 1951, 42, 94-99 

lates of the Ta 
Manifest Anxiety Scale. J. abnorm. soc. P 
chol., 1955, 50, 75-77 


. Matarazzo, J. D., Ulett, G. A., Guze, S. B., & 


Saslow, G. The relationship between anxiety 
level and several measures of intelligence. J 
consult. Psychol., 1954, 18, 201-205 

May zner, M. S., »| & Tresselt, M. E. The ef 
fect of the competition and generalization of 
sets with respect to manifest anxiety. J. gen 
Psychol., in press 

Concept 

function of personal 
values, anxiety and rigidity. J. Pers., in press 

Montague, E. K. The role of anxiety in serial 
rote learning. J. exp. Psychol., 1953, 45, 91-96 

Ramond, C. K. Anxiety and task as determiners 
of verbal performance. J. exp. Psychol., 1953, 
46, 120-124 

Spence, K. W., & Farber, I. E 
extinction as a function of anxiety. J. exp 
Psychol., 1953, 45, 116-119 


span as a composite 


Conditioning and 


. Spence, K. W., Farber, I. E., & Taylor, Elaine 


The relation of electric shock and anxiety to 
level of performance in eyelid conditioning 
J. exp. Psychol., 1954, 48, 404—408 

Taylor, Janet A. The relationship of anxiety to 
the conditioned eyelid response. J. exp. Psy- 
chol., 1951, 41, 81-92. 


. Taylor, Janet A. A personality scale of manifest 


anxiety. J. abnorm. soc. Psychol., 1953, 48, 
285--290. 


. Taylor, Janet A., & Spence, K. W. The relation- 


ship of anxiety level performance in serial 
learning. J. exp. Psychol., 1952, 44, 61-64. 


EW 





Books 


Aldrich, C. Knight. Psychiatry for the family phy- 
sician. New York: McGraw-Hill, 1955. Pp. ix + 
276. $5.75. 
This book is written for the man in general prac- 

tice, and attempts to coach him in the handling of 

the numerous psychological and psychiatric prob- 
lems he is sure to meet. Discussion of general prin- 
ciples of emotional growth and development is 
modifiedly Freudian, and reminiscent of mental hy- 
giene manuals. The book is written at a very non- 
technical level. In his chapters on such treatment as 
the general physician can undertake, the author’s 
emphasis is on “clarification” which appears to be 
nondirective therapy, although he does not use the 
term nor’ list any of the nondirective literature in his 
bibliography. He notes that some psychologists may 
be helpful for vocational advice, which the physician 
is not equipped to give, may aid in diagnosis of 
mental deficiency, and finally, “The laboratory, be- 
sides helping to rule out organic disease, may make 
positive contributions to the diagnosis of emotional 
illness through psychological tests. As a general rule 
the usefulness of personality tests is in proportion to 
their complexity; the best require the psychologist’s 
special training. The so-called projective tests must 
be administered, scored, and interpreted by a psy- 
chologist; the physician can administer written or 
card-sorting tests such as the Minnesota Multiphasic 

Personality Inventory, but unless he has had spe- 

cial training or at least an unusual background of 

reading, he should send the tests to a psychologist 

for scoring and interpretation.”—A. R. 


Almy, Millie. Child development. New York: Holt, 

1955. Pp. xviii + 490. $4.50. 

Almy’s book is an excellent example of a signifi- 
cant new trend in texts in child psychology, a trend 
of decided interest to clinical psychologists. No 
longer limited to descriptive and normative topics, 
child psychology has become increasingly dynamic 
and social, giving emphasis to the development of 
adjustments and personality, and to the relationships 
of children with their parents and peers. As a frame- 
work for describing child development, Almy uses 
case studies of six normal but very different children, 
first seen as eighteen-year-olds graduating from high 
school, and then traced from birth through adoles- 


Note—The reviews were prepared by the Editor 
and the Advisory Editors, who may be identified by 
their initials. 


404 


sh \\ [1 7/ 
ANDAUTESTS 


cence. The presentation is organized in terms of de- 
velopmental levels, and suborganized in descriptions 
of the biological, social, and dynamic aspects of each 
period of growth. The author’s lucid and mainly 
nontechnical style of writing does not conceal her 
breadth of scholarship. Research studies are used 
and documented, but do not intrude themselves be- 
tween the student and his understanding of the broad 
features of child growth. This is a book that psy 
chologists will like, and that students will use profit- 
ably —L. F. S. 


Basowitz, Harold, Persky, Harold, Korchin, Sheldon 
J., & Grinker, Roy R. Anxiety and stress. New 
York: McGraw-Hill, 1955. Pp. xv + 320. $8.00. 
This is mainly a report of an interdisciplinary 

study of healthy young men under the stress of 

paratroop training. In three experiments with small 
groups, randomly selected men (WN = 30), nonjump- 

ing paratroopers (V=15), and men highest (N 

10) and lowest (N=10) in hippuric acid were 

thoroughly tested and studied from a psychological 

and biochemical standpoint. The paratroopers showed 
comparatively little anxiety, and it was related more 
to anticipation of failure than to destruction. Many 
interesting findings are reported in the volume. Bio- 
chemical variables distinguish passing and failing 
subjects more clearly than do the psychological ones 
There seems to be a hierarchy of stress sensitivity 
among the biochemical variables. High hippuric acid 
individuals are more anxiety-prone and judged to be 
less able to withstand stress. Little generality of 
variation is found among different test procedures 
used. The relationships between initial assessment 
and later performance under stress are not im- 
pressive. An increase in anxiety immediately fol- 
lows graduation—the end phenomenon. The authors 
briefly review the theoretical and experimental lit- 
erature on anxiety and concur that it is internally 
derived, unrelated to external threat, and highly in- 
dividual. Despite this, they chose paratroop training 
with the voluntary selection factor, the apparently 
high morale, the high percentage of nonneurotic 
men, and the clearly defined objective dangers as 
hypothetically productive of great anxiety. The study 
contributes to our understanding of biochemical and 
related psychological changes of healthy individuals 
undergoing stress in an apparently unified group. The 
importance of interpersonal relations and their ef- 
fects on the individuals before and during the study 
was mentioned but not adequately emphasized in 


this report, the reviewer believes—F. McK 






: 
' 
















































na a 


~ 


TPS EMT. 


New Books 


Deutsch, Felix, & Murphy, William F. The clinical 
interview. Vol. Il. Therapy. New York: Interna- 
tional Universities Press, 1955. Pp. 335. $7.50. 


In the June, 1955, number of this Journal, Volume 
I: Diagnosis, of The Clinical Interview by Deutsch 
and Murphy was reviewed. Volume II is the promised 
continuation and deals with sector psychotherapy, 
which is “. . . a goal-limited therapy based directly 
upon psychoanalytic principles.” The authors em- 
phasize that theirs is no competing school in psy- 
chiatry, but an aspect of psychoanalysis. Psycho- 
analysis is the only full process of therapy, and 
* any other psychotherapy, short of psycho- 
analysis, can achieve a similar task only in a lim- 
ited sector.” They point out again and again that, 
except for limited goals, their techniques are basic- 
ally psychoanalytic: “Making the unconscious con- 
scious still remains the sine qua non of all psy- 


choanalytically derived therapy.” The format is 
essentially that of the first volume, with an excellent 
opening chapter describing in clear fashion the bases 


of the method, a number of recorded annotated in- 
terviews illustrating the method, and a closing chap- 
ter on critique and conclusions. Designed for the 
training of psychiatrists, the two volumes are admir- 
ably suited for advanced training of clinical psy- 
chologists in the clinical interview—M. K. 


Oliven, John F. Sexual hygiene and pathology: a 
manual for the physician. Philadelphia: Lippin- 
cott, 1955. Pp. xiii + 481. $10.00 
Unlike most books on sex which are mainly a 

sorry lot, Oliven’s volume is thorough, objective, un- 

biased, and critical. It makes use of relevant research 
findings and integrates them well with clinical wis- 
dom. Although the book is intended for physicians, 
it can be recommended to psychologists as a satis- 
factory handbook on normal sexual development, 

sex education, and sex pathology —L. F. S. 

Porteus, Stanley D. The maze test: recent advances. 
Palo Alto, Calif.: Pacific Books, 1955. Pp. 71 (pa- 
per). $2.00. 

This monograph, valuable to any user of the 
Porteus Mazes, summarizes recent research on the 
test and describes the development and standardiza- 
tion of the Maze Extension Series, an alternate form. 
The second form for retesting is a welcome supple- 
ment to the Maze Test which, after forty years, is 
surely one of the most durable of psychological in- 
struments.—L. F. S 


Wallin, J. E. Wallace. The odyssey of a psychologist 
Wilmington 4, Delaware (311 Highland Ave., Lyn- 
dalia): Author, 1955. Pp. xvii + 243 (paper). $3.00. 
While striving for its future, youthful clinical psy- 

chology may also profit from a few glances at its 

past. The autobiography of Wallin, one of our real 
pioneers, provides a unique source book. In it we can 
catch glimpses of doctoral education at Yale and 

Clark at the turn of the century, of the day-by-day 

work of a clinical psychologist in 1910, and of the 


and Tests 








early evolution of special education for ment 
fectives in Missouri and in Ohio. The book 
only an informative history, it is even moré 
warm, human story which reveals much of i 
thor. Controversies are not avoided, and ad 


trators who frustrated early psychological | 
are roundly trounced. The reviewer recon 
Wallin’s autobiography { veek-end readil 
those who want to combi professional in 
with the pleasures of numerous chuckles and 


casional gasp.—L. F. S. 


Tests 


Hertzka, Alfred F., & Guilford, J. P. Log 
soning. High school—college. 1 fort ; 
Test boc k] { (15¢, § ( | 





| 
(2¢); scoring stencil (50¢ manual, pp. 4 
Beverly Hills, Calif.: Sheridan Supy ( 
A brief syllogism test, based on f t 
pres nted as a measur yt 
deduction. The reliability of 
is of a i ibbrevi 
screening, .£ >t ce A p ntil 
are based on | stu s and 
lege students, all located in « California ci 
evidence of the validity of the published test 
although some moderate correlations wi 
matics achievement ( re cited for a 
lar form.” The meager data suggest that tl 
perhaps suitable for speciali in 1 
is by no means ready for widespread pra 


plication —L. F. S. 


405 





al de- 


is not 


Tinker, Miles A. Tinker Speed of Re Te 

College. 2 forms. 5 min. Test | I 

per 25, either form), with 1 ial, pp. 1 

men set (50¢). Minneapolis: Univer f kh 

sota Press, 1947, 1955 

The Tinker Speed of Reading Test provid 
measure of the rate of sustained reading of sir 
material, without difficulties of comprehension. It 
intended for research on the effect of conditions s 


as illumination on speed of reading, or for pr 


use in reading clinics at the college lev 





pages of two-column text is sufficient to } 
rapid readers busy for Ss, perr 
tended testing priods. R sa tis 


and the two forms are nearly equivalent. Nor 


5, 10, and 30-minute time limits were obtai 


Wi ryt " } y a P 
wilhhes i Ss pho! OTes i F . 


Books Received 


Abramson, Harold A. (Ed Neuropharn 
Transactions of the First Conference, 195 


York: Josiah Macy, Jr. Foundation, 1955 


210. $4.25. 
Abramson, Harold A. (Ed.) Problems of « 
ness. Transactions of the Fifth Conferer 


New York: Josiah Macy, Jr. Foundation, 


Pp. 180. $3.50. 


406 


American Friends Service Committee. Speak truth to 
power. Philadelphia: American Friends Service 
Committee, 1955. Pp. 71. 25¢ (paper); $1.00 
(boards). 

Bowen, Howard R. The business enterprise as a sub- 
ject for research. Pamphlet No. 11. New York: 
Social Science Research Council, 1955. Pp. viii + 
103 (paper). $1.25. 

Bush, Robert R., & Mosteller, Frederick. Stochastic 
models for learning. New York: Wiley, 1955. Pp. 
xvi + 365. $9.00. 

Cronbach, Lee J. (Ed.) Text materials in modern 
education. Urbana: Univer. of Illinois Press, 1955. 
Pp. 216 (paper). $2.50. 

Cruze, Wendell W. Psychology in nursing. New 
York: McGraw-Hill, 1955. Pp. ix + 494. $5.50. 
Delay, J., Pichot, P., Lempériére, T., & Perse, J. Le 
test de Rorschach et la personnalité épileptique. 
Paris, France: Presses Univer. de France, 1955. Pp. 

218 (paper). 1,000 fr. 


New Books and Tests 


Greenacre, Phyllis. Swift and Carroll: a psycko- 
analytic study of two lives. New York: Interna- 
tional Universities Press, 1955. Pp. 306. $5.00. 

Hare, A. Paul, Borgatta, Edgar F., & Bales, Robert 
F. (Eds.) Small groups: studies in social inter- 
action. New York: Knopf, 1955. Pp. xv + 666. 
$6.50. 

Heiser, Karl F. Our backward children. New York: 
Norton, 1955. Pp. 240. $3.75. 

Ruja, Harry. Psychology for life. New York: Mc- 
Graw-Hill, 1955. Pp. x + 427. $4.75. 

Thorne, Frederick C. Principles of psychological ex- 
amining. Brandon, Vt.: Journal of Clinical Psy- 
chology, 1955. Pp. v + 494. $6.00. 

Thorndike, Robert L., & Hagen, Elizabeth. Measure- 
ment and evaluation in psychology and education. 
New York: Wiley, 1955. Pp. viii + 575. $5.50. 

Weinstein, Edwin A., & Kahn, Robert L. Denial of 
illness: symbolic and physiological aspects. Spring- 
field, Ill.: Charles C Thomas, 1955. Pp. viii + 166. 


$4.75. 














Psycholinguistics 
A Survey of Theory and Researci: Problems 


Report of the 1953 Summer Seminar Sponsored by the Committee on 
Linguistics and Psychology of the Social Sciences Research Council 


Cuaritzs E. Oscoop, Editor Tuomas A. Sesron, Associate Editor 
With a Foreword ty Joun W. Ganvoner 
Joun B. Cannout Leonanp D. Newmark 
Susan M. Ervin Sor Saporra 
Joserg H. GreENBERG Donatp E. WALKER 
James L. JenkINns Keitoca Wrs0Nn 
Fiorp G. LounsBury 


Price, $2.50 


Order from 


AMERICAN PSYCHOLOGICAL ASSOCIATION 
4333 Sixteenth Street N. W. 
Washington 6, D. C. 














v 
. ; , 
, ' 
’ bs 7 
- 
. 
. : = 
. . 
s et 
a . 
. 
—- ~f 
m . 
A . 
‘ 
<@ 
. 
‘ 
. 
° 
Fe ‘ 
- F 
. 
. 
a 
 onied 





pets 





